Tag Archives: measurement

New European Standard for Social Impact Measurement

geces_report_coffee_2_evalblog

 

Evaluation has truly become a global movement. The number of evaluators and evaluation associations around the world is growing, and they are becoming more interconnected. What affects evaluation in one part of the world increasingly affects how it is practiced in another.

That is why the European standard for social impact measurement, announced just a few weeks ago, is important for evaluators in the US.

According to the published report and its accompanying press release, the immediate purpose of the standard is to help social enterprises access EU financial support, especially in relation to the European Social Entrepreneurship Funds (EuSEFs) and the Programme for Employment and Social Innovation (EaSI).

But as László Andor, EU Commissioner for Employment, Social Affairs and Inclusion, pointed out, there is a larger purpose:

The new standard…sets the groundwork for social impact measurement in Europe. It also contributes to the work of the Taskforce on Social Impact Investment set up by the G7 to develop a set of general guidelines for impact measurement to be used by social impact investors globally.

That is big, and it has the potential to affect evaluation around the world.

What is impact measurement?

For evaluators in the US, the term impact measurement may be unfamiliar. It has greater currency in Europe and, of late, in Canada. Defining the term precisely is difficult because, as an area of practice, impact measurement is evolving quickly.

Around the world, there is a growing demand for evaluations that incorporate information about impact, values, and value. It is coming from government agencies, philanthropic foundations, and private investors who want to increase their social impact by allocating their public or private funds more efficiently.

Sometimes these funders are called impact investors. In some contexts, the label signals a commitment to grant making that incorporates the tools and techniques of financial investors. In others, it signals a commitment by private investors to a double bottom line—a social return on their investment for others and a financial return for themselves.

These funders want to know if people are better off in ways that they and other stakeholders believe are important. Moreover, they want to know whether those impacts are large enough and important enough to warrant the funds being spent to produce them. In other words, did the program add value?

Impact measurement may engage a wide range of stakeholders to define the outcomes of interest, but the overarching definition of success—that the program adds value—is typically driven by funders. Value may be assessed with quantitative, qualitative, or mixed methods, but almost all of the impact measurement work that I have seen has framed value in quantitative terms.

Is impact measurement the same as evaluation?

I consider impact measurement a specialized practice within evaluation. Others do not. Geographic and disciplinary boundaries have tended to isolate those who identify themselves as evaluators from those who conduct impact measurement—often referred to as impact analysts. These two groups are beginning to connect, like evaluators of every kind around the world.

I like to think of impact analysts and evaluators as twins who were separated at birth and then, as adults, accidentally bump into each other at the local coffee shop. They are delighted and confused, but mostly delighted. They have a great deal to talk about.

How is impact measurement different from impact evaluation?

There is more than one approach to impact evaluation. There is what we might call traditional impact evaluation—randomized control trials and quasi-experiments as described by Shadish, Cook, and Campbell. There are also many recently developed alternatives—contribution analysis, evaluation of collective impact, and others.

Impact measurement differs from traditional and alternative impact evaluation in a number of ways, among them:

  1. how impacts are estimated and
  2. a strong emphasis on valuation.

I discuss both in more detail below. Briefly, impacts are frequently estimated by adjusting outcomes for a pre-established set of potential biases, usually without reference to a comparison or control group. Valuation estimates the importance of impacts to stakeholders—the domain of human values—and expresses it in monetary units.

These two features are woven into the European standard and have the potential to become standard practices elsewhere, including the US. If they were to be incorporated into US practice, it would represent a substantial change in how we conduct evaluations.

What is the new European standard?

The standard creates a common process for conducting impact measurement, not a common set of impacts or indicators. The five-step process presented in the report is surprisingly similar to Tyler’s seven-step evaluation procedure, which he developed in the 1930s as he directed the evaluation of the Eight-Year Study across 30 schools. For its time, Tyler’s work was novel and the scale impressive.

tyler_geces_table_evalblog

Tyler’s evaluation procedure developed in the 1930s and the new European standard process: déjà vu all over again?

Tyler’s first two steps were formulating and classifying objectives (what do programs hope to achieve and which objectives can be shared across sites to facilitate comparability and learning). Deeply rooted in the philosophy of progressive education, he and his team identified the most important stakeholders—students, parents, educators, and the larger community—and conducted much of their work collaboratively (most often with teachers and school staff).

Similarly, the first two steps of the European standard process are identifying objectives and stakeholders (what does the program hope to achieve, who benefits, and who pays). They are to be implemented collaboratively with stakeholders (funders and program staff chief among them) with an explicit commitment to serving the interests of society more broadly.

Tyler’s third and fourth steps were defining outcomes in terms of behavior and identifying how and where the behaviors could be observed. The word behavior was trendy in Tyler’s day. What he meant was developing a way to observe or quantify outcomes. This is precisely setting relevant measures, the third step of the new European standard process.

Tyler’s fifth and sixth steps were selecting, trying, proving, and improving measures as they function in the evaluation. Today we would call this piloting, validation, and implementation. The corresponding step in the standard is measure, validate and value, only the last of these falling outside the scope of Tyler’s procedure.

Tyler concluded his procedure with interpreting results, which for him included analysis, reporting, and working with stakeholders to facilitate the effective use of results. The new European standard process concludes in much the same way, with reporting results, learning from them, and using them to improve the program.

How are impacts estimated?

Traditional impact evaluation defines an impact as the difference in potential outcomes—the outcomes participants realized with the program compared to the outcomes they would have realized without the program.

It is impossible to observe both of these mutually exclusive conditions at the same time. Thus, all research designs can be thought of as hacks, some more elegant than others, that allow us to approximate one condition while observing the other.

The European standard takes a similar view of impacts and describes a good research design as one that takes the following into account:

  • attribution,the extent to which the program, as opposed to other programs or factors, caused the outcomes;
  • deadweight, outcomes that, in the absence of the program, would have been realized anyway;
  • drop-off, the tendency of impacts to diminish over time; and
  • displacement, the extent to which outcomes realized by program participants prevent others from realizing those outcomes (for example, when participants of a job training program find employment, it reduces the number of open jobs and as a result may make it more difficult for non-participants to find employment).

For any given evaluation, many research designs may meet the above criteria, some with the potential to provide more credible findings than others.

However, impact analysts may not be free to choose the research design with the potential to provide the most credible results. According to the standard, the cost and complexity of the design must be proportionate to the size, scope, cost, potential risks, and potential benefits of the program being evaluated. In other words, impact analysts must make a difficult tradeoff between credibility and feasibility.

How well are analysts making the tradeoff between credibility and feasibility?

At the recent Canadian Evaluation Society Conference, my colleagues Cristina Tangonan, Anna Fagergren (not pictured), and I addressed this question. We described the potential weaknesses of research designs used in impact measurement generally and Social Return on Investment (SROI) analyses specifically. Our work is based on a review of publicly available SROI reports (to date, 107 of 156 identified reports) and theoretical work on the statistical properties of the estimates produced.

ces_2014_tangonan_gargani_evalblogAt the CES 2014 conference.

What we have found so far leads us to question whether the credibility-feasibility tradeoffs are being made in ways that adequately support the purposes of SROI analyses and other forms of impact measurement.

One design that we discussed starts with measuring the outcome realized by program participants. For example, how many participants of a job training program found employment, or the test scores realized by students who were enrolled in a new education program. Sometimes impact analysts will measure the outcome as a pre-program/post-program difference, often they measure the post-program outcome level on its own.

Once the outcome measure is in hand, impact analysts adjust it for attribution, deadweight, drop-off, and displacement by subtracting some amount or percentage for each potential bias. The adjustments may be based on interviews with past participants, prior academic or policy research, or sensitivity analysis. Rarely are they based on comparison or control groups constructed for the evaluation. The resulting adjusted outcome measure is taken as the impact estimate.

This is an example of a high-feasibility, low-credibility design. Is it good enough for the purposes that impact analysts have in mind? Perhaps, but I’m skeptical. There is a century of systematic research on estimating impacts—why didn’t this method, which is much more feasible than many alternatives, become a standard part of evaluation practice decades before? I believe it is because the credibility of the design (or more accurately, the results it can produce) is considered too low for most purposes.

From what I understand, this design–and others that are similar–would meet the European standard. That leads me to question whether the new standard has set the bar too low, unduly favoring feasibility over credibility.

What is valuation?

In the US, I believe we do far less valuation than is currently being done in Europe and Canada. Valuation expresses the value (importance) of impacts in monetary units (a measure of importance).

If the outcome, for example, were earned income, then valuation would entail estimating an impact as we usually would. If the outcome were health, happiness, or well-being, valuation would be more complicated. In this case, we would need to translate non-monetary units to monetary units in a way that accurately reflects the relative value of impacts to stakeholders. No easy feat.

In some cases, valuation may help us gauge whether the monetized value of a program’s impact is large enough to matter. It is difficult to defend spending $2,000 per participant of a job training program that, on average, results in additional earned income of $1,000 per participant. Participants would be better off if we gave $2,000 to each.

At other times, valuation may not be useful. For example, if one health program saves more lives than another, I don’t believe we need to value lives in dollars to judge their relative effectiveness.

Another concern is that valuation reduces the certainty of the final estimate (in monetary units) as compared to an impact estimate on its own (in its original units). That is a topic that I discussed at the CES conference, and will again at the conferences of the European Evaluation Society, Social Impact Analysts Association, and the American Evaluation Association .

There is more to this than I can hope to address here. In brief—the credibility of a valuation can never be greater than the credibility of the impact estimate upon which it is based. Call that Gargani’s Law.

If ensuring the feasibility of an evaluation results in impact estimates with low credibility (see above), we should think carefully before reducing credibility further by expressing the impact in monetary units.

Where do we go from here?

The European standard sets out to solve a problem that is intrinsic to our profession–stakeholders with different perspectives are constantly struggling to come to agreement about what makes an evaluation good enough for the purposes they have in mind. In the case of the new standard, I fear the bar may be set too low, tipping the balance in favor of feasibility over credibility.

That is, of course, speculation. But so too is believing the balance is right or that it is tipped in the other direction. What is needed is a program of research—research on evaluation—that helps us understand whether the tradeoffs we make bear the fruit we expect.

The lack of research on evaluation is a weak link in the chain of reasoning that makes our work matter in Europe, the US, and around the world. My colleagues and I are hoping to strengthen that link a little, but we need others to join us. I hope you will.

4 Comments

Filed under AEA Conference, Conference Blog, Evaluation, Evaluation Quality, Program Evaluation, Research

Measurement Is Not the Answer

gates_impact_evalblog

Bill Gates recently summarized his yearly letter in an article for the Wall Street Journal entitled My Plan to Fix the World’s Biggest Problems…Measure Them!

As an evaluator, I was thrilled. I thought, “Someone with clout is making the case for high-quality evaluation!” I was ready to love the article.

To my great surprise, I didn’t.

The premise of the piece was simple. Organizations working to change the world should set clear goals, choose an approach, measure results, and use those measures to continually refine the approach.

At this level of generality, who could disagree? Certainly not evaluators—we make arguments like this all the time.

Yet, I must—with great disappointment—conclude that Gates failed to make the case that measurement matters. In fact, I believe he undermined it by the way he used measurements.

Gates is not unique in this respect. His Wall Street Journal article is just one instance of a widespread problem in the social sector—confusing good measures for good inference.

Measures versus Inference

The difference between measures and inferences can be subtle. Measures quantify something that is observable. The number of students who graduate from high school or estimates of the calories people consume are measures. In order to draw conclusions from measures, we make inferences.  Two types of inference are of particular interest to evaluators.

(1) Inferences from measures to constructs. Constructs—unobservable aspects of humans or the world that we seek to understand—and the measures that shed light on them are not interchangeable. For example, what construct does the high school graduation rate measure? That depends. Possibly education quality, student motivation, workforce readiness, or something else that we cannot directly observe. To make an inference from measure to construct, the construct of interest must be well defined and its measure selected on the basis of evidence.

Evidence is important because, among other things, it can suggest whether many, few, or only one measure is required to understand a construct well. By using the sole measure of calories consumed, for example, we gain a poor understanding of a broad construct like health. However, we can use that single measure to gain a critical understanding of a narrower construct like risk of obesity.

(2) Inferences from measures to impacts. If high school graduation rates go up, was it the result of new policies, parental support, another reason left unconsidered, or a combination of several reasons? This sort of inference represents one of the fundamental challenges of program evaluation, and we have developed a number of strategies to address it. None is perfect, but more often than not we can identify a strategy that is good enough for a specific context and purpose.

Why do I think Gates made weak inferences from good measures? Let’s look at the three examples he offered in support of his premise that measurement is the key to solving the world’s biggest problems.

Example 1: Ethiopia

Gates described how Ethiopia became more committed to providing healthcare services in 2000 as part of the Millennium Development Goals. After that time, the country began tracking the health services it provided in new ways. As evidence that the new measurement strategy had an impact, Gates reported that child mortality decreased 60% in Ethiopia since 1990.

In this example, the inference from measure to impact is not warranted. Based on the article, the sole reason to believe that the new health measurement strategy decreased child mortality is that the former happened before the latter. Inferring causality from the sequential timing of events alone has been recognized as an inferential misstep for so long that it is best known by its Latin name, post hoc ergo propter hoc.

Even if we were willing to make causal inferences based on sequential timing alone, it would not be possible in this case—the tracking system began sometime after 2000 while the reported decrease in child mortality was measured from 1990.

Example 2: Polio

The global effort to eradicate polio has come down to three countries—Nigeria, Pakistan, and Afghanistan—where immunizing children has proven especially difficult. Gates described how new measurement strategies, such as using technology to map villages and track health workers, are making it possible to reach remote, undocumented communities in these countries.

It makes sense that these measurement strategies should be a part of the solution. But do they represent, “Another story of success driven by better measurement,” as Gates suggests?

Maybe yes, maybe no—the inference from measure to impact is again not warranted, but for different reasons.

In the prior example, Gates was looking back, claiming that actions (in the past) made an impact (in the past) because the actions preceded the impact. In this example, he made that claim that ongoing actions will lead to a future impact because the actions precede the intended impact of eradicating polio. The former was a weak inference, the latter weaker still because it incorporates speculation about the future.

Even if we are willing to trust an inference about an unrealized future in which polio has been eradicated, there is another problem. The measures Gates described are implementation measures. Inferring impact from implementation may be warranted if we have strong faith in a causal mechanism, in this case that contact with remote communities leads to immunization which in turn leads to reduction in the transmission of the disease.

We should have strong faith in second step of this causal mechanism—vaccines work. Unfortunately, we should have doubts about the first step because many who are contacted by health workers refuse immunization. The Bulletin of the World Health Organization reported that parental refusal in some areas around Karachi has been widespread, accounting for 74% of missed immunizations there. It is believed that the reasons for the refusals were fear related to safety and the religious implications of the vaccines. New strategies for mapping and tracking cannot, on the face of it, address these concerns.

So I find it difficult to accept that polio immunization is a story of success driven by measurement. It seems more like a story in which new measures are being used in a strategic manner. That’s laudable—but quite different from what was claimed.

Example 3: Education

The final example Gates provided came from the foundation’s $45 million Measures of Effective Teaching (MET) study. As described in the article, the MET study concluded that multiple measures of teacher effectiveness can be used to improve the way administrators manage school systems and teachers provide instruction. The three measures considered in the study were standardized test scores (transformed into controversial units called value-added scores), student surveys of teacher quality, and scores provided by trained observers of classroom instruction.

The first problem with this example is the inference from measures to construct. Everyone wants more effective teachers, but not everyone defines effectiveness the same way. There are many who disagree with how the construct of teacher effectiveness was defined in the MET study—that a more effective teacher is one who promotes student learning in ways that are reflected by standardized test scores.

Even if we accept the MET study’s narrow construct of teacher effectiveness, we should question whether multiple measures are required to understand it well. As reported by the foundation, all three measures in combination explain about 52% of the variation in teacher effectiveness in math and 26% in English-language arts. Test scores alone (transformed into value-added scores) explain about 48% and 20% of the variation in the math and English-language arts, respectively. The difference is trivial, making the cost of gathering additional survey and observation data difficult to justify.

The second problem is inference from measures to impact. Gates presented Eagle County’s experience as evidence that teacher evaluations improve education. He stated that Eagle County’s teacher evaluation system is “likely one reason why student test scores improved in Eagle County over the past five years.” Why does he believe this is likely? He doesn’t say. I can only respond post hoc ergo propter hoc.

So What?

The old chestnut that lack of evidence is not evidence of lacking applies here. Although Gates made inferences that were not well supported by logic and evidence, it doesn’t mean he arrived at the wrong conclusions. Or the right conclusions. All we can do is shrug our shoulders.

And it doesn’t mean we should not be measuring the performance and impact of social enterprises. I believe we should.

It does mean that Gates believes in the effectiveness of potential solutions for which there is little evidence. For someone who is arguing that measurement matters, he is setting a poor example. For someone who has the power to implement solutions on an unprecedented scale, it can also be dangerous.

5 Comments

Filed under Commentary, Evaluation, Evaluation Quality, Program Evaluation

Conference Blog: The Wharton “Creating Lasting Change” Conference

How can corporations promote the greater good?  Can they do good and be profitable?  How well can we measure the good they are doing?

These were some of the questions explored at a recent Wharton School Conference entitled Creating Lasting Change: From Social Entrepreneurship to Sustainability in Retail.  I provide a brief recap of the event.  Then I discuss why I believe program evaluators, program designers, and corporations have a great deal to learn from each other.

The Location

The conference took place at Wharton’s stunning new San Francisco campus.  By stunning I mean drop-dead gorgeous.  Here is one of its many views.

An Unusual and Effective Conference

The conference was jointly organized by three entities within the Wharton School—the Jay H. Baker Retailing Center, the Initiative for Global Environmental Leadership, and the Wharton Program for Social Impact.

When I first read this I scratched my head.  A conference that combined the interests of any two made sense to me.  Combining the interests of all three seemed like a stretch.  I found—much to my delight—that the conference worked very well because of its two-panel structure.

Panel 1 addressed the social and environmental impact of new ventures; Panel 2 addressed the impact of large, established corporations.  This offered an opportunity to compare and contrast new with old, small with large, and risk takers with the risk averse.

Fascinating and enlightening.  I explain why after I describe the panels.

Panel 1: Social Entrepreneurship/Innovation

The first panel considered how entrepreneurs and venture capitalists can promote positive environmental and social change.

  • Andrew D’Souza, Chief Revenue Officer at Top Hat Monocle, discussed how his company developed web-based clickers for classrooms and online homework tools that are designed to promote learning—a social benefit that can be directly monetized.
  • Mike Young, Director of Technology Development at Innova Dynamics, described how his company’s social mission drives their development and commercialization of “disruptive advanced materials technologies for a sustainable future.”
  • Amy Errett, Partner at the venture capital firm Maveron, emphasized the firm’s belief that businesses focusing on a social mission tend to achieve financial success.
  • Susie Lee, Principal at TBL Capital, outlined her firm’s patient capital approach, which favors companies that balance their pursuit of social, environmental, and financial objectives.
  • Raghavan Anand, Chief Financial Officer at One Million Lights, moderated the panel.

Panel 2: Sustainability/CSR in the Retail Industry

The second panel discussed how large, established companies impact society and the natural world, and what it means for a corporation to act responsibly.

Christy Consler, Vice President of Sustainability at Safeway Inc., made the case that the large grocer (roughly 1,700 stores and 180,000 employees) needs to focus on sustainable, socially responsible operations to ensure that it has dependable sources for its product—food—as the world population swells by 2 billion over the next 35 years.

Lori Duvall, Director of Operational Sustainability at eBay Inc., summarized eBay’s sustainability efforts, which include solar power installations, reusable packaging, and community engagement.

Paul Dillinger, Senior Director-Global Design at Levi Strauss & Co., made an excellent presentation on the social and environmental consequences—positive and negative—of the fashion industry, and how the company is working to make a positive impact.

Shauna Sadowski, Director of Sustainability at Annie’s (you know, the company that makes the cute organic, bunny-shaped mac and cheese), discussed how bringing natural foods to the marketplace motivates sustainable, community-centered operations.

Barbara Kahn moderated.  She wins the prize for having the longest title—the Patty & Jay H. Baker Professor, Professor of Marketing; Director, Jay H. Baker Retailing Center—and from what I could tell, she deserves every bit of the title.

Measuring Social Impact

I was thrilled to find corporations, new and old, concerned with making the world a better place.  Business in general, and Wharton in particular, have certainly changed in the 20 years since I earned my MBA.

The unifying theme of the panels was impact.  Inevitably, that discussion turned from how corporations were working to make social and environmental impacts to how they were measuring impacts.  When it did, the word evaluation was largely absent, being replaced by metrics, measures, assessments, and indicators.  Evaluation, as a field and a discipline, appears to be largely unknown to the corporate world.

Echoing what I heard at the Harvard Social Enterprise Conference (day 1 and day 2), impact measurement was characterized as nascent, difficult, and elusive.  Everyone wants to do it; no one knows how.

I find this perplexing.  Is the innovation, operational efficiency, and entrepreneurial spirit of American corporations insufficient to crack the nut of impact measurement?

Without a doubt, measuring impact is difficult—but not for the reasons one might expect.  Perhaps the greatest challenge is defining what one means by impact.  This venerable concept has become a buzzword, signifying both more an less than it should for different people in different settings.  Clarifying what we mean simplifies the task of measurement considerably.  In this setting, two meanings dominated the discussion.

One was the intended benefit of a product or service.  Top Hat Monocle’s products are intended to increase learning.  Annie’s foods are intended to promote health.  Evaluators are familiar with this type of impact and how to measure it.  Difficult?  Yes.  It poses practical and technical challenges, to be sure.  Nascent and elusive?  No.  Evaluators have a wide range of tools and techniques that we use regularly to estimate impacts of this type.

The other dominant meaning was the consequences of operations.  Evaluators are probably less familiar with this type of impact.

Consider Levi’s.  In the past, 42 liters of fresh water were required to produce one pair of Levi’s jeans.  According to Paul Dillinger, the company has since produced about 13 million pairs using a more water-efficient process, reducing the total water required for these jeans from roughly 546 million liters to 374 million liters—an estimated savings of 172 million liters.

Is that a lot?  The Institute of Medicine estimates that one person requires about 1,000 liters of drinking water per year (2.2 to 3 liters per day making a variety of assumptions)—so Levi’s saved enough drinking water for about 172,000 people for one year.  Not bad.

But operational impact is more complex than that.  Levi’s still used the equivalent yearly drinking water for 374,000 people in places where potable water may be in short supply.  The water that was saved cannot be easily moved where it may be needed more for drinking, irrigation, or sanitation.  If the water that is used for the production of jeans is not handled properly, it may contaminate larger supplies of fresh water, resulting in a net loss of potable water.  The availability of more fresh water in a region can change behavior in ways that negate the savings, such as attracting new industries that depend on water or inducing wasteful water consumption practices.

Is it difficult to measure operational impact?  Yes.  Even estimating something as tangible as water use is challenging.  Elusive?  No.  We can produce impact estimates, although they may be rough.  Nascent?  Yes and no.  Measuring operational impact depends on modeling systems, testing assumptions, and gauging human behavior.  Evaluators have a long history of doing these things, although not in combination for the purpose of measuring operational impact.

It seems to me that evaluators and corporations could learn a great deal from each other.  It is a shame these two worlds are so widely separated.

Designing Corporate Social Responsibility Programs

With all the attention given to estimating the value of corporate social responsibility programs, the values underlying them were not fully explored.  Yet the varied and often conflicting values of shareholders and stakeholders pose the most significant challenge facing those designing these programs.

Why do I say that?  Because it has been that way for over 100 years.

The concept of corporate social responsibility has deep roots.  In 1909, William Tolman wrote about a trend he observed in manufacturing.  Many industrialists, by his estimation, were taking steps to improve the working conditions, pay, health, and communities of their employees.  He noted that these unprompted actions had various motives—a feeling that workers were owed the improvements, unqualified altruism, or the belief that the efforts would lead to greater profits.

Tolman placed a great deal of faith in the last motive.  Too much faith.  Twentieth-century industrial development was not characterized by rational, profit-maximizing companies competing to improve the lot of stakeholders in order to increase the wealth of shareholders.  On the contrary, making the world a better place typically entailed tradeoffs that shareholders found unacceptable.

So these early efforts failed.  The primary reason was that their designs did not align the values of shareholders and stakeholders.

Can the values of shareholders and stakeholders be more closely aligned today?  I believe they can be.  The founders of many new ventures, like Top Hat Monocle and Innova Dynamics, bring different values to their enterprises.  For them, Tolman’s nobler motives—believing that people deserve a better life and a desire to do something decent in the world—are the cornerstones of their company cultures.  Even in more established organizations—Safeway and Levi’s—there appears to be a cultural shift taking place.  And many venture capital firms are willing to take a patient capital approach, waiting longer and accepting lower returns, if it means they can promote a greater social good.

This is change for the better.  But I wonder if we, like Tolman, are putting too much faith in win-win scenarios in which we imagine shareholders profit and stakeholders benefit.

It is tempting to conclude that corporate social responsibility programs are win-win.  The most visible examples, like those presented at this conference, are.  What lies outside of our field of view, however, are the majority of rational, profit-seeking corporations that are not adopting similar programs.  Are we to conclude that these enterprises are not as rational as they should be? Or have we yet to design corporate responsibility programs that resolve the shareholder-stakeholder tradeoffs that most companies face?

Again, there seems to be a great deal that program designers, who are experienced at balancing competing values, and corporations can learn from each other…if only the two worlds met.

1 Comment

Filed under Commentary, Conference Blog, Design, Evaluation, Program Design, Program Evaluation

Should We Fear Subjectivity?

ringsmedium1

Like many this summer, I found myself a bit perplexed by the way Olympic athletes in many sports received scores. It was not so much the scoring systems per se that had me flummoxed, although they were far from simple. Rather it was realizing that, while the systems for scoring gymnastics, ice skating, boxing, and sailing had been overhauled over the past few years in an effort to remedy troubling flaws, the complaint that these scores are subjective — and by extension unfair — lingered.

This dissatisfaction reflects an unwritten rule that applies to our efforts to evaluate the quality or merit of any human endeavor: if the evaluation is to be perceived as fair, it must demonstrate that it is not subjective. But is this a useful rule? Before we can wrestle with that question, we need to consider what we mean by subjective and why we feel compelled to avoid it. Continue reading

Leave a comment

Filed under Commentary, Evaluation, Program Evaluation