Category Archives: Design

Confessions of a Conference Junkie

evalblog_conference_junkieIt’s true—I am addicted to conferences. While I read about evaluation, write about evaluation, and do evaluations in my day-to-day professional life, it’s not enough. To truly connect to the field and its swelling ranks of practitioners, researchers, and supporters, I need to attend conferences. Compulsively. Enthusiastically. Constantly.

Over the past few months, I was honored to be the keynote speaker at the Canadian Evaluation Society conference in Toronto and the Danish Evaluation Society in Kolding. Over the past two years I have been from Helsinki to Honolulu to speak, present, and give workshops. The figure below shows some of that travel (conferences indicated with darker circles, upcoming travel with dashed lines).

evalblog_travel_network_diagram

But today is special—it’s the first day of the American Evaluation Association conference in Washington, DC. If conferences were cities, this one would be New York—big, vibrant, and international.

aea_2013_program_evalblog

And this year, in addition to my presentations, receptions, and workshops (here and here), I will attempt to do something I have never done before—blog from the conference.

EvalBlog has been quiet this summer. Time to make a little digital noise.

Leave a comment

Filed under AEA Conference, Conference Blog, Design, Evaluation, Program Design, Program Evaluation

Conference Blog: The American Evaluation Association Conference About to Kickoff

It’s been a busy few months for me.  I have been leading workshops, making presentations, attending conferences, and working in Honolulu, Helsinki, London, Tallinn (Estonia), and Claremont.  I met some amazing people and learned a great deal about how evaluation is being practiced around the world.  More about this in later posts.

This morning, I am in Minneapolis for the Annual Conference of the American Evaluation Association, which begins today. While I am here, I will be reporting on the latest trends, techniques, and opportunities in evaluation.

Today will be interesting.  I lead a half-day workshop on program design with Stewart Donaldson. Then I chair a panel discussion on the future of evaluation (a topic that, to my surprise, has mushroomed from a previous EvalBlog post  into a number of conference presentations and a website).

Off to the conference–more later.

Leave a comment

Filed under AEA Conference, Design, Evaluation, Program Design, Program Evaluation

Conference Blog: Catapult Labs 2012

Did you miss the Catapult Labs conference on May 19?  Then you missed something extraordinary.

But don’t worry, you can get the recap here.

The event was sponsored by Catapult Design, a nonprofit firm in San Francisco that uses the process and products of design to alleviate poverty in marginalized communities.  Their work spans the worlds of development, mechanical engineering, ethnography, product design, and evaluation.

That is really, really cool.

I find them remarkable and their approach refreshing.  Even more so because they are not alone.  The conference was very well attended by diverse professionals—from government, the nonprofit sector, the for-profit sector, and design—all doing similar work.

The day was divided into three sets of three concurrent sessions, each presented as hands-on labs.  So, sadly, I could attend only one third of what was on offer.  My apologies to those who presented and are not included here.

I started the day by attending Democratizing Design: Co-creating With Your Users presented by Catapult’s Heather Fleming.  It provided an overview of techniques designers use to include stakeholders in the design process.

Evaluators go to great lengths to include stakeholders.  We have broad, well-established approaches such as empowerment evaluation and participatory evaluation.  But the techniques designers use are largely unknown to evaluators.  I believe there is a great deal we can learn from designers in this area.

An example is games.  Heather organized a game in which we used beans as money.  Players chose which crops to plant, each with its own associated cost, risk profile, and potential return.  The expected payoff varied by gender, which was arbitrarily assigned to players.  After a few rounds the problem was clear—higher costs, lower returns, and greater risks for women increased their chances of financial ruin, and this had negative consequences for communities.

I believe that evaluators could put games to good use.  Describing a social problem as a game requires stakeholders to express their cause-and-effect assumptions about the problem.  Playing with a group allows others to understand those assumptions intimately, comment upon them, and offer suggestions about how to solve the problem within the rules of the game (or perhaps change the rules to make the problem solvable).

I have never met a group of people who were more sincere in their pursuit of positive change.  And honest in their struggle to evaluate their impact.  I believe that impact evaluation is an area where evaluators have something valuable to share with designers.

That was the purpose of my workshop Measuring Social Impact: How to Integrate Evaluation & Design.  I presented a number of techniques and tools we use at Gargani + Company to design and evaluate programs.  They are part of a more comprehensive program design approach that Stewart Donaldson and I will be sharing this summer and fall in workshops and publications (details to follow).

The hands-on format of the lab made for a great experience.  I was able to watch participants work through the real-world design problems that I posed.  And I was encouraged by how quickly they were able to use the tools and techniques I presented to find creative solutions.

That made my task of providing feedback on their designs a joy.  We shared a common conceptual framework and were able to speak a common language.  Given the abstract nature of social impact, I was very impressed with that—and their designs—after less than 90 minutes of interaction.

I wrapped up the conference by attending Three Cups, Rosa Parks, and the Polar Bear: Telling Stories that Work presented by Melanie Moore Kubo and Michaela Leslie-Rule from See Change.  They use stories as a vehicle for conducting (primarily) qualitative evaluations.  They call it story science.  A nifty idea.

I liked this session for two reasons.  First, Melanie and Michaela are expressive storytellers, so it was great fun listening to them speak.  Second, they posed a simple question—Is this story true?—that turns out to be amazingly complex.

We summarize, simplify, and translate meaning all the time.  Those of us who undertake (primarily) quantitative evaluations agonize over this because our standards for interpreting evidence are relatively clear but our standards for judging the quality of evidence are not.

For example, imagine that we perform a t-test to estimate a program’s impact.  The t-test indicates that the impact is positive, meaningfully large, and statistically significant.  We know how to interpret this result and what story we should tell—there is strong evidence that the program is effective.

But what if the outcome measure was not well aligned with the program’s activities? Or there were many cases with missing data?  Would our story still be true?  There is little consensus on where to draw the line between truth and fiction when quantitative evidence is flawed.

As Melanie and Michaela pointed out, it is critical that we strive to tell stories that are true, but equally important to understand and communicate our standards for truth.  Amen to that.

The icing on the cake was the conference evaluation.  Perhaps the best conference evaluation I have come across.

Everyone received four post-it notes, each a different color.  As a group, we were given a question to answer on a post-it of a particular color, and only a minute to answer the question.  Immediately afterward, the post-its were collected and displayed for all to view, as one would view art in a gallery.

Evaluation as art—I like that.  Immediate.  Intimate.  Transparent.

Gosh, I like designers.

4 Comments

Filed under Conference Blog, Design, Evaluation, Program Design, Program Evaluation

Measuring Impact: Integrating Evaluation & Design (Workshop May 19 in SF)

Interested in design for social change?  Curious about how to measure the social impact of your designs?  Check out my upcoming San Francisco workshop—Measuring Impact: Integrating Evaluation & Design–taking place on May 19 as part of CatapultLabs: Design Tools to Spark Social Change.

Come join in a day of hands-on labs with leading designers and organizations promoting social change.

Learn more about it at http://catapultlabs-2012.eventbrite.com/–space is limited.

Leave a comment

Filed under Conference Blog, Design, Evaluation, Program Design, Program Evaluation

Conference Blog: The Wharton “Creating Lasting Change” Conference

How can corporations promote the greater good?  Can they do good and be profitable?  How well can we measure the good they are doing?

These were some of the questions explored at a recent Wharton School Conference entitled Creating Lasting Change: From Social Entrepreneurship to Sustainability in Retail.  I provide a brief recap of the event.  Then I discuss why I believe program evaluators, program designers, and corporations have a great deal to learn from each other.

The Location

The conference took place at Wharton’s stunning new San Francisco campus.  By stunning I mean drop-dead gorgeous.  Here is one of its many views.

An Unusual and Effective Conference

The conference was jointly organized by three entities within the Wharton School—the Jay H. Baker Retailing Center, the Initiative for Global Environmental Leadership, and the Wharton Program for Social Impact.

When I first read this I scratched my head.  A conference that combined the interests of any two made sense to me.  Combining the interests of all three seemed like a stretch.  I found—much to my delight—that the conference worked very well because of its two-panel structure.

Panel 1 addressed the social and environmental impact of new ventures; Panel 2 addressed the impact of large, established corporations.  This offered an opportunity to compare and contrast new with old, small with large, and risk takers with the risk averse.

Fascinating and enlightening.  I explain why after I describe the panels.

Panel 1: Social Entrepreneurship/Innovation

The first panel considered how entrepreneurs and venture capitalists can promote positive environmental and social change.

  • Andrew D’Souza, Chief Revenue Officer at Top Hat Monocle, discussed how his company developed web-based clickers for classrooms and online homework tools that are designed to promote learning—a social benefit that can be directly monetized.
  • Mike Young, Director of Technology Development at Innova Dynamics, described how his company’s social mission drives their development and commercialization of “disruptive advanced materials technologies for a sustainable future.”
  • Amy Errett, Partner at the venture capital firm Maveron, emphasized the firm’s belief that businesses focusing on a social mission tend to achieve financial success.
  • Susie Lee, Principal at TBL Capital, outlined her firm’s patient capital approach, which favors companies that balance their pursuit of social, environmental, and financial objectives.
  • Raghavan Anand, Chief Financial Officer at One Million Lights, moderated the panel.

Panel 2: Sustainability/CSR in the Retail Industry

The second panel discussed how large, established companies impact society and the natural world, and what it means for a corporation to act responsibly.

Christy Consler, Vice President of Sustainability at Safeway Inc., made the case that the large grocer (roughly 1,700 stores and 180,000 employees) needs to focus on sustainable, socially responsible operations to ensure that it has dependable sources for its product—food—as the world population swells by 2 billion over the next 35 years.

Lori Duvall, Director of Operational Sustainability at eBay Inc., summarized eBay’s sustainability efforts, which include solar power installations, reusable packaging, and community engagement.

Paul Dillinger, Senior Director-Global Design at Levi Strauss & Co., made an excellent presentation on the social and environmental consequences—positive and negative—of the fashion industry, and how the company is working to make a positive impact.

Shauna Sadowski, Director of Sustainability at Annie’s (you know, the company that makes the cute organic, bunny-shaped mac and cheese), discussed how bringing natural foods to the marketplace motivates sustainable, community-centered operations.

Barbara Kahn moderated.  She wins the prize for having the longest title—the Patty & Jay H. Baker Professor, Professor of Marketing; Director, Jay H. Baker Retailing Center—and from what I could tell, she deserves every bit of the title.

Measuring Social Impact

I was thrilled to find corporations, new and old, concerned with making the world a better place.  Business in general, and Wharton in particular, have certainly changed in the 20 years since I earned my MBA.

The unifying theme of the panels was impact.  Inevitably, that discussion turned from how corporations were working to make social and environmental impacts to how they were measuring impacts.  When it did, the word evaluation was largely absent, being replaced by metrics, measures, assessments, and indicators.  Evaluation, as a field and a discipline, appears to be largely unknown to the corporate world.

Echoing what I heard at the Harvard Social Enterprise Conference (day 1 and day 2), impact measurement was characterized as nascent, difficult, and elusive.  Everyone wants to do it; no one knows how.

I find this perplexing.  Is the innovation, operational efficiency, and entrepreneurial spirit of American corporations insufficient to crack the nut of impact measurement?

Without a doubt, measuring impact is difficult—but not for the reasons one might expect.  Perhaps the greatest challenge is defining what one means by impact.  This venerable concept has become a buzzword, signifying both more an less than it should for different people in different settings.  Clarifying what we mean simplifies the task of measurement considerably.  In this setting, two meanings dominated the discussion.

One was the intended benefit of a product or service.  Top Hat Monocle’s products are intended to increase learning.  Annie’s foods are intended to promote health.  Evaluators are familiar with this type of impact and how to measure it.  Difficult?  Yes.  It poses practical and technical challenges, to be sure.  Nascent and elusive?  No.  Evaluators have a wide range of tools and techniques that we use regularly to estimate impacts of this type.

The other dominant meaning was the consequences of operations.  Evaluators are probably less familiar with this type of impact.

Consider Levi’s.  In the past, 42 liters of fresh water were required to produce one pair of Levi’s jeans.  According to Paul Dillinger, the company has since produced about 13 million pairs using a more water-efficient process, reducing the total water required for these jeans from roughly 546 million liters to 374 million liters—an estimated savings of 172 million liters.

Is that a lot?  The Institute of Medicine estimates that one person requires about 1,000 liters of drinking water per year (2.2 to 3 liters per day making a variety of assumptions)—so Levi’s saved enough drinking water for about 172,000 people for one year.  Not bad.

But operational impact is more complex than that.  Levi’s still used the equivalent yearly drinking water for 374,000 people in places where potable water may be in short supply.  The water that was saved cannot be easily moved where it may be needed more for drinking, irrigation, or sanitation.  If the water that is used for the production of jeans is not handled properly, it may contaminate larger supplies of fresh water, resulting in a net loss of potable water.  The availability of more fresh water in a region can change behavior in ways that negate the savings, such as attracting new industries that depend on water or inducing wasteful water consumption practices.

Is it difficult to measure operational impact?  Yes.  Even estimating something as tangible as water use is challenging.  Elusive?  No.  We can produce impact estimates, although they may be rough.  Nascent?  Yes and no.  Measuring operational impact depends on modeling systems, testing assumptions, and gauging human behavior.  Evaluators have a long history of doing these things, although not in combination for the purpose of measuring operational impact.

It seems to me that evaluators and corporations could learn a great deal from each other.  It is a shame these two worlds are so widely separated.

Designing Corporate Social Responsibility Programs

With all the attention given to estimating the value of corporate social responsibility programs, the values underlying them were not fully explored.  Yet the varied and often conflicting values of shareholders and stakeholders pose the most significant challenge facing those designing these programs.

Why do I say that?  Because it has been that way for over 100 years.

The concept of corporate social responsibility has deep roots.  In 1909, William Tolman wrote about a trend he observed in manufacturing.  Many industrialists, by his estimation, were taking steps to improve the working conditions, pay, health, and communities of their employees.  He noted that these unprompted actions had various motives—a feeling that workers were owed the improvements, unqualified altruism, or the belief that the efforts would lead to greater profits.

Tolman placed a great deal of faith in the last motive.  Too much faith.  Twentieth-century industrial development was not characterized by rational, profit-maximizing companies competing to improve the lot of stakeholders in order to increase the wealth of shareholders.  On the contrary, making the world a better place typically entailed tradeoffs that shareholders found unacceptable.

So these early efforts failed.  The primary reason was that their designs did not align the values of shareholders and stakeholders.

Can the values of shareholders and stakeholders be more closely aligned today?  I believe they can be.  The founders of many new ventures, like Top Hat Monocle and Innova Dynamics, bring different values to their enterprises.  For them, Tolman’s nobler motives—believing that people deserve a better life and a desire to do something decent in the world—are the cornerstones of their company cultures.  Even in more established organizations—Safeway and Levi’s—there appears to be a cultural shift taking place.  And many venture capital firms are willing to take a patient capital approach, waiting longer and accepting lower returns, if it means they can promote a greater social good.

This is change for the better.  But I wonder if we, like Tolman, are putting too much faith in win-win scenarios in which we imagine shareholders profit and stakeholders benefit.

It is tempting to conclude that corporate social responsibility programs are win-win.  The most visible examples, like those presented at this conference, are.  What lies outside of our field of view, however, are the majority of rational, profit-seeking corporations that are not adopting similar programs.  Are we to conclude that these enterprises are not as rational as they should be? Or have we yet to design corporate responsibility programs that resolve the shareholder-stakeholder tradeoffs that most companies face?

Again, there seems to be a great deal that program designers, who are experienced at balancing competing values, and corporations can learn from each other…if only the two worlds met.

1 Comment

Filed under Commentary, Conference Blog, Design, Evaluation, Program Design, Program Evaluation

Conference Blog: The Harvard Social Enterprise Conference (Day 2)

What follows is a second series of short posts written while I attended the Social Enterprise Conference (#SECON12).  The conference (February 25-26) was presented by the Harvard Business School and the Harvard Kennedy School.

I spent much of the day attending a session entitled ActionStorm: A Workshop on Designing Actionable InnovationsSuzi Sosa, Executive Director of the Dell Social Innovation Challenge did a great job of introducing a design thinking process for those developing new social enterprises.  I plan to blog more about design thinking in a future post.

The approach presented in the workshop combined basic design thinking activities (mind mapping, logic modeling, and empathy mapping) that I believe can be of value to program evaluators as well as program designers.

I wonder, however, how well these methods fit the world of grant-funded programs.  Increasingly, the guidelines for grant proposals put forth by funding agencies specify the core elements of a program’s design.  It is common for funders to specify the minimum number of contact hours, desired length of  service, and required service delivery methods.  When this is the case, designers may have little latitude to innovate, closing off opportunities to improve quality and efficiency.

Evaluation moment #4: Suzi constantly challenged us to specify how, where, and why we would measure the impact of the social enterprises we were discussing.  It was nice to see someone advocating for evaluation “baked into” program designs.  The participants were receptive, but they seemed somewhat daunted by the challenge of measuring impact.

Next, I attended Taking Education Digital: The Impact of Sharing KnowledgeChris Dede (Harvard Graduate School of Education) moderated.  I have always found his writing insightful and thought provoking, and he did not disappoint today. He provided a clear, compelling call for using technology to transform education.

His line of reasoning, as I understand it, is this: the educational system, as currently structured, lacks the capacity to meet federal and state mandates to increase (1) the quality of education delivered to students and (2) desired high school and college graduation rates.  Technology can play a transformational role by increasing the quality and capacity of the educational system.

Steve Carson followed by describing his work with MIT OpenCourseWare, which illustrated very nicely the distinction between innovation and transformation.  MIT OpenCourseWare was, at first, a humble idea–use the web to make it easier for MIT students and faculty to share learning-related materials.  Useful, but not innovative (as the word is typically used).

It turned out that the OpenCourseWare materials were being used by a much larger, more diverse group of formal and informal learners for wonderful, unanticipated educational purposes.  So without intending, MIT had created a technology with none of the trappings of innovation yet tremendous potential to be transformational.

The moral of the story: social impact can be achieved in unexpected ways, and in cultures that value innovation, the most unexpected way is to do something unexceptional exceptionally well.

Next, Chris Sprague (OpenStudy) discussed his social learning start up.  OpenStudy connects students to each other–so far 150,000 from 170 countries–in ways that promote learning.  Think of it as a worldwide study hall.

Social anything is hot in the tech world, but this is more than Facebook dressed in a scholar’s robe.  The intent is to create meaningful interactions around learning, tap expertise, and spark discussions that build understanding.  Think about how much you can learn about a subject simply by having a cup of coffee with an expert.  Imagine how much more you could learn if you were connected to more experts and did not need to sit next to them in a cafe to communicate.

The Pitch for Change took place in the afternoon.  It was the culmination of a process in which young social entrepreneurs give “elevator pitches” describing new ventures.  Those with the best pitches are selected to move on to the next round, where they make another pitch.

To my eyes, the final round combined the most harrowing elements of job interviews and Roman gladiatorial games–one person enters the arena, fights for survival for three minutes, and then looks to the crowd for thumbs up or down (see the picture at the top of this entry). Of course, they don’t use thumbs–that would too BC (before connectivity).  Instead, they use smartphones to vote via the web.

At the end, the winners were given big checks (literally, the checks were big; the dollar amounts, not so much).

But winners receive more than a little seed capital.  The top two winners are fast-tracked to the semifinal round of the 2013 Echoing Green Fellowship, the top four winners are fast-tracked to the semifinal round of the 2012 Dell Social Challenge, and the project that best makes use of technology to solve a social or environmental problem wins the Dell Technology Award.  Not bad for a few minutes in the arena.

Afterward, Dr. Judith Rodin, President of the Rockefeller Foundation, made the afternoon keynote speech, which focused on innovation.  She is a very good speaker and the audience was eager to hear about the virtues of new ideas.  It went over well.

Evaluation Moment 5: Dr. Rodin made the case for measuring social impact.  She described it as essential to traditional philanthropy and more recent efforts around social impact investing.  She noted that Rockefeller is developing its capacity in this area, however, evaluation remains a tough nut to crack.

The last session of the day was fantastic–and not just because an evaluator was on the panel.  It was entitled If at First You Don’t Succeed: The Importance of Prototyping and Iteration in Poverty Alleviation.  Prototyping is not just a subject of interest for me, it is a way of life.

Mike North (ReAllocate) discussed how he leverages volunteers–individuals and corporations–to prototype useful, innovative products.  In particular, he described his ongoing efforts to prototype an affordable corrective brace for children in developing countries who are born with clubfoot.  You can learn more about it in this video.

Timothy Prestero (Design that Matters) walked us through the process he used to prototype the Firefly.  About 60% of newborns in developing countries suffer from jaundice, and about 10% of these go on to suffer brain damage or another disability.  The treatment is simple–exposure to blue light.  Firefly is the light source.

What is so hard about designing a lamp that shines a blue light?  Human behavior.

For example, hospital workers often put more than one baby in the same phototherapy device, which promotes infectious disease.  Consequently, Firefly needed to be designed in such a way that only one baby could be treated at a time.  It also needed to be inexpensive in order to address the root cause of the problem behavior–too few devices in hospitals.  Understanding these behaviors, and designing with them in mind, requires lengthy prototyping.

Molly Kinder described her work at Development Innovation Ventures (DIV), a part of USAID.  DIV provides financial and other support to innovative projects selected through a competitive process.  In many ways, it looks more like a new-style venture fund than part of a government agency.  And DIV rigorously evaluates the impact of the projects it supports.

Evaluation moment #5: Wow, here is a new-style funder routinely doing high quality evaluations–including but not limited to randomized control trials–in order t0 scale projects strategically.

Shawn Powers, from the Jameel Poverty Action Lab at MIT (J-PAL), talked about J-PAL’s efforts to conduct randomized trials in developing countries.  Not surprisingly, I am a big fan of J-PAL, which is dedicated to finding effective ways of improving the lives of the poor and bringing them to scale.

Looking back on the day:  The tight connection between design and evaluation was a prominent theme.  While exploring the theme, the discussion often turned to how evaluation can help social enterprises scale up.  It seems to me that we first need t0 scale up rigorous evaluation of social enterprises.  The J-PAL model is a good one, but it isn’t possible for academic institutions to scale up fast enough or large enough to meet the need.  So what do we do?

Leave a comment

Filed under Conference Blog, Design, Evaluation, Program Design, Program Evaluation

Toward a Taxonomy of Wicked Problems

Program designers and evaluators have become keenly interested in wicked problems.  More precisely, we are witnessing a second wave of interest—one that holds new promise for the design of social, educational, environmental, and cultural programs.

The concept of wicked problems was first introduced in the late 1960s by Horst Rittel, then at UC Berkeley.  It became a popular subject for authors in many disciplines, and writing on the subject grew through the 1970s and into the early 1980s (the first wave).  At that point, writing on the subject slowed until the late 1990s when the popularity of the subject again grew (the second wave).

Here are the results of a Google ngram analysis that illustrates the two waves of interest (click the image to enlarge).

Rittel contrasted wicked problems with tame problems.  Various authors, including Rittel, have described the tame-wicked dichotomy in different ways.  Most are based on the 10 characteristics of wicked problems that Rittel introduced in the early 1970s.  Briefly…

Tame problems can be solved in isolation by an expert—the problems are relatively easy to define, the range of possible solutions can be fully enumerated in advance, stakeholders hold shared values related to the problems and possible solutions, and techniques exist to solve the problems as well as measure the success of implemented solutions.

Wicked problems are better addressed collectively by diverse groups—the problems are difficult to define, few if any possible solutions are known in advance, stakeholders disagree about underlying values, and we can neither solve the problems (in the sense that they can be eliminated) nor measure the success of implemented solutions.

In much of the writing that emerged during the first wave of interest, the tame-wicked dichotomy was the central theme.  It was argued that most problems of interest to policymakers are wicked, which limited the utility of the rational, quantitative, stepwise thinking that dominated policy planning, operations research, and management science at the time.  A new sort of thinking was needed.

In the writing that has emerged in the second wave, that new sort of thinking has been given many names—systems thinking, design thinking, complexity thinking, and developmental thinking, to name a few.  Each, supposedly, can tame what would otherwise be wicked.

Perhaps.

The arguments for “better ways of thinking” are weakened by the assumption that wicked and tame represent a dichotomy.  If most social problems met all 10 of Rittel’s criteria, we would be doomed.  We aren’t.

Social problems are more or less wicked, each in its own way.  Understanding how a problem is wicked, I believe, is what will enable us to think more effectively about social problems and to tame them more completely.

Consider two superficially similar examples that are wicked in different ways.

Contagious disease: We understand the biological mechanisms that would allow us to put an end to many contagious diseases.  In this sense, these diseases are tame problems.  However, we have not been able to eradicate all contagious diseases that we understand well.  The reason, in part, is that many people hold values that conflict with solutions that are, on a biological level, known to be effective.  For example, popular fear of vaccines may undermine the effectiveness of mass vaccination, or the behavioral changes needed to reduce infection rates may clash with local cultures.  In cases such as this, contagious diseases pose wicked problems because of conflicting values.  The design of programs to eradicate these diseases would need to take this source of wickedness into account, perhaps by including strong stakeholder engagement efforts or public education campaigns.

Cancer: We do not fully understand the biological mechanisms that would allow us to prevent and cure many forms of cancer.  At the same time, the behaviors that might reduce the risk of these cancers (such as healthy diet, regular exercise, not smoking, and avoiding exposure to certain chemicals) conflict with values that many people hold (such as the importance of personal freedom, desire for comfort and convenience, and the need to earn a living in certain industrial settings). In these cases, cancer poses wicked problems for two reasons—our lack of understanding and conflicting values.  This may or may not make it “more” wicked than eradicating well-understood contagious diseases; that is difficult to assess.  But it certainly makes it wicked in a different way, and the design of programs to end cancer would need to take that difference into account and address both sources of wickedness.

The two examples above are wicked problems, but they are wicked for different reasons.  Those reasons have important implications for program designers.  My interest over the next few months is to flesh out a more comprehensive taxonomy of wickedness and to unpack its design implications.  Stay tuned.

5 Comments

Filed under Design, Program Design

Should the Pie Chart Be Retired?

The ability to create and interpret visual representations has been an important part of the human experience since we began drawing on cave walls at Chauvet.

Today, that ability—what I call visualcy—has even greater importance.  We use visuals to discover how the world works, communicate our discoveries, plan efforts to improve the world, and document the success of our efforts.

In short, visualcy affects every aspect of program design and evaluation.

The evolution of our common visual language, sadly, has been shaped by the default settings of popular software, the norms of the conference room, and the desire to attract attention.  It is not a language constructed to advance our greater purposes.  In fact, much of our common language works against our greater purposes.

An example of a counterproductive element of our visual language is the pie chart.

Consider this curious example from the New York Times Magazine (1/15/2012).

This pie chart has a humble purpose—summarize reader responses to an article on obesity in the US.  It failed that purpose stunningly.  Here are some reasons why.

(1) Three-dimensionality reduces accuracy: Not only are 3-D graphs harder to read accurately, but popular software can construct them inaccurately.  The problem—for eye and machine—arises from the translation of values in 1-D or 2-D space into values in 3-D space.  This is a substantial problem with pie charts (imagine computing the area of a pie slice while taking its 3-D perspective into account) as well as other types of graph.  Read Stephanie Evergreen’s blog post on the perils the 3-D to see a good example.

(2) Pie charts impede comparisons: People have trouble comparing pie slices by eye.  Think you can? Here is a simple pie chart I constructed from the data in the NYT Magazine graph.  Which slice is larger—orange or the blue?

This is much clearer.

Note that the the the Y axis ranges from 0% to 100%.  That is what makes the bar chart a substitute for the pie chart.  Sometimes the Y axis is truncated innocently to save column inches or intentionally to create a false impression, like this:

Differences are exaggerated and large values seem to be closer to 100% than they really are.  Don’t do this.

(3) The visual theme is distracting: I suspect the NYT Magazine graph is intended to look like some sort of food.  Pieces of a pie? Cake? Cheese?  It doesn’t work.  This does.

Unless you are evaluating the Pillsbury Bake-Off, however, it is probably not an appropriate theme.

(4) Visual differentiators add noise: Graphs must often differentiate elements. A classic example is differentiating treatment and control group averages using bars of different colors.  In the NYT Magazine pie chart, the poor choice of busy patterns makes it very difficult to differentiate one piece of the pie from another.  The visual chaos is reminiscent of the results of a “poll” of Iraqi voters presented by the Daily Show in which a very large number of parties purportedly held almost equal levels of support.

(5) Data labels add more noise: Data labels can increase clarity.  In this case, however, the swarm of curved arrows connecting labels to pieces of the pie adds to the visual chaos.  Even this tangle of labels is better because readers instantly understand that Iraq received a disproportionate amount of the aid provided to many countries.

Do you think I made up these reasons?   Then read this report by RAND that investigated graph comprehension using experimental methods.  Here is a snippet from the abstract:

We investigated whether the type of data display (bar chart, pie chart, or table) or adding a gratuitous third dimension (shading to give the illusion of depth) affects the accuracy of answers of questions about the data. We conducted a randomized experiment with 897 members of the American Life Panel, a nationally representative US web survey panel. We found that displaying data in a table lead [sic] to more accurate answers than the choice of bar charts or pie charts. Adding a gratuitous third dimension had no effect on the accuracy of the answers for the bar chart and a small but significant negative effect for the pie chart.

There you have it—empirical evidence that it is time to retire the pie chart.

Alas, I doubt that the NYT Magazine, infographic designers, data viz junkies, or anyone with a reporting deadline will do that.  As every evaluator knows, it is far easier to present empirical evidence than respond to it.

5 Comments

Filed under Design, Evaluation, Visualcy

The Future of Evaluation: 10 Predictions

Before January comes to a close, I thought I would make a few predictions.  Ten to be exact.  That’s what blogs do in the new year, after all.

Rather than make predictions about what will happen this year—in which case I would surely be caught out—I make predictions about what will happen over the next ten years.  It’s safer that way, and more fun as I can set my imagination free.

My predictions are not based on my ideal future.  I believe that some of my predictions, if they came to pass, would present serious challenges to the field (and to me).  Rather, I take trends that I have noticed and push them out to their logical—perhaps extreme—conclusions.

In the next ten years…

(1) Most evaluations will be internal.

The growth of internal evaluation, especially in corporations adopting environmental and social missions, will continue.  Eventually, internal evaluation will overshadow external evaluation.  The job responsibilities of internal evaluators will expand and routinely include organizational development, strategic planning, and program design.  Advances in online data collection and real-time reporting will increase the transparency of internal evaluation, reducing the utility of external consultants.

(2) Evaluation reports will become obsolete.

After-the-fact reports will disappear entirely.  Results will be generated and shared automatically—in real time—with links to the raw data and documentation explaining methods, samples, and other technical matters.  A new class of predictive reports, preports, will emerge.  Preports will suggest specific adjustments to program operations that anticipate demographic shifts, economic shocks, and social trends.

(3) Evaluations will abandon data collection in favor of data mining.

Tremendous amounts of data are being collected in our day-to-day lives and stored digitally.  It will become routine for evaluators to access and integrate these data.  Standards will be established specifying the type, format, security, and quality of “core data” that are routinely collected from existing sources.  As in medicine, core data will represent most of the outcome and process measures that are used in evaluations.

(4) A national registry of evaluations will be created.

Evaluators will begin to record their studies in a central, open-access registry as a requirement of funding.  The registry will document research questions, methods, contextual factors, and intended purposes prior to the start of an evaluation.  Results will be entered or linked at the end of the evaluation.  The stated purpose of the database will be to improve evaluation synthesis, meta-analysis, meta-evaluation, policy planning, and local program design.  It will be the subject of prolonged debate.

(5) Evaluations will be conducted in more open ways.

Evaluations will no longer be conducted in silos.  Evaluations will be public activities that are discussed and debated before, during, and after they are conducted.  Social media, wikis, and websites will be re-imagined as virtual evaluation research centers in which like-minded stakeholders collaborate informally across organizations, geographies, and socioeconomic strata.

(6) The RFP will RIP.

The purpose of an RFP is to help someone choose the best service at the lowest price.  RFPs will no longer serve this purpose well because most evaluations will be internal (see 1 above), information about how evaluators conduct their work will be widely available (see 5 above), and relevant data will be immediately accessible (see 3 above).  Internal evaluators will simply drop their data—quantitative and qualitative—into competing analysis and reporting apps, and then choose the ones that best meet their needs.

(7) Evaluation theories (plural) will disappear.

Over the past 20 years, there has been a proliferation of theories intended to guide evaluation practice.  Over the next ten years, there will be a convergence of theories until one comprehensive, contingent, context-sensitive theory emerges.  All evaluators—quantitative and qualitative; process-oriented and outcome-oriented; empowerment and traditional—will be able to use the theory in ways that guide and improve their practice.

(8) The demand for evaluators will continue to grow.

The demand for evaluators has been growing steadily over the past 20 to 30 years.  Over the next ten years, the demand will not level off due to the growth of internal evaluation (see 1 above) and the availability of data (see 3 above).

(9) The number of training programs in evaluation will increase.

There is a shortage of evaluation training programs in colleges and universities.  The shortage is driven largely by how colleges and universities are organized around disciplines.  Evaluation is typically found as a specialty within many disciplines in the same institution.  That disciplinary structure will soften and the number of evaluation-specific centers and training programs in academia will grow.

(10) The term evaluation will go out of favor.

The term evaluation sets the process of understanding a program apart from the process of managing a program.  Good evaluators have always worked to improve understanding and management.  When they do, they have sometimes been criticized for doing more than determining the merit of a program.  To more accurately describe what good evaluators do, evaluation will become known by a new name, such as social impact management.

…all we have to do now is wait ten years and see if I am right.

41 Comments

Filed under Design, Evaluation, Program Design, Program Evaluation

Tragic Graphic: The Wall Street Journal Lies with Statistics?

Believe it or not, the Wall Street Journal provides another example of an inaccurate circular graph.  This time the error so closely parallels an example from Darrell Huff’s classic How to Lie with Statistics that I find myself wondering—intentional deception or innocent blunder?

The image above comes from Huff’s book.  The moneybag on the left represents the average weekly salary of carpenters in the fictional country of Rotundia.  The bag on the right, the average weekly salary of carpenters in the US.

Based on the graph, how much more do carpenters in the US earn?  Twice?  Three times?  Four times?  More?

The correct answer is that they earn twice as much, but the graph gives the impression that the difference is greater than that.  The heights of the bags are proportionally correct but their areas are not.  Because we tend to focus on the areas of shapes, graphics like this can easily mislead readers.

Misleading the reader, of course, was Huff’s intention.  As he put it:

…I want you to infer something, to come away with an exaggerated impression, but I don’t want to be caught at my tricks.

What were the intentions of the Wall Street Journal this Saturday (1/21/2012) when it previewed Charles Murray’s new book Coming Apart?

In the published preview, Murray made a highly qualified claim—the median family income across 14 of the most elite places to live in 1960 rose from $84,000 in 1960 to $163,000 in 2000, after adjusting incomes to reflect today’s purchasing power.

Those cumbersome qualifications take the oomph right out of the claim.  Too long to be a provocative sound bite, the Journal refashioned it into a provocative sight bite.  Wow, those incomes really grew!

But not as much as the graph suggests.  The text states that the median salary just about doubled.  The picture indicates that it quadrupled.  It’s Huff’s moneybag trick—even down to the relative proportion of  salaries!

Here is a comparison of the inaccurate graph with an accurate version I constructed.  The accurate graph is far less provocative.

As a rule, the areas of circles are difficult for people to compare by eye.  In fact, using the area of any two-dimensional shape to represent one-dimensional data is probably a bad idea.  Not only do interpretations vary depending on the shape that is used, but they vary depending on the relative placement of the shapes.

To illustrate these points, here are six alternative representations of Murray’s data.  Which, if any, are lies?

3 Comments

Filed under Design, Visualcy