The ability to create and interpret visual representations has been an important part of the human experience since we began drawing on cave walls at Chauvet.
Today, that ability—what I call visualcy—has even greater importance. We use visuals to discover how the world works, communicate our discoveries, plan efforts to improve the world, and document the success of our efforts.
In short, visualcy affects every aspect of program design and evaluation.
The evolution of our common visual language, sadly, has been shaped by the default settings of popular software, the norms of the conference room, and the desire to attract attention. It is not a language constructed to advance our greater purposes. In fact, much of our common language works against our greater purposes.
An example of a counterproductive element of our visual language is the pie chart.
Consider this curious example from the New York Times Magazine (1/15/2012).
This pie chart has a humble purpose—summarize reader responses to an article on obesity in the US. It failed that purpose stunningly. Here are some reasons why.
(1) Three-dimensionality reduces accuracy: Not only are 3-D graphs harder to read accurately, but popular software can construct them inaccurately. The problem—for eye and machine—arises from the translation of values in 1-D or 2-D space into values in 3-D space. This is a substantial problem with pie charts (imagine computing the area of a pie slice while taking its 3-D perspective into account) as well as other types of graph. Read Stephanie Evergreen’s blog post on the perils the 3-D to see a good example.
(2) Pie charts impede comparisons: People have trouble comparing pie slices by eye. Think you can? Here is a simple pie chart I constructed from the data in the NYT Magazine graph. Which slice is larger—orange or the blue?
This is much clearer.
Note that the the the Y axis ranges from 0% to 100%. That is what makes the bar chart a substitute for the pie chart. Sometimes the Y axis is truncated innocently to save column inches or intentionally to create a false impression, like this:
Differences are exaggerated and large values seem to be closer to 100% than they really are. Don’t do this.
(3) The visual theme is distracting: I suspect the NYT Magazine graph is intended to look like some sort of food. Pieces of a pie? Cake? Cheese? It doesn’t work. This does.
Unless you are evaluating the Pillsbury Bake-Off, however, it is probably not an appropriate theme.
(4) Visual differentiators add noise: Graphs must often differentiate elements. A classic example is differentiating treatment and control group averages using bars of different colors. In the NYT Magazine pie chart, the poor choice of busy patterns makes it very difficult to differentiate one piece of the pie from another. The visual chaos is reminiscent of the results of a “poll” of Iraqi voters presented by the Daily Show in which a very large number of parties purportedly held almost equal levels of support.
(5) Data labels add more noise: Data labels can increase clarity. In this case, however, the swarm of curved arrows connecting labels to pieces of the pie adds to the visual chaos. Even this tangle of labels is better because readers instantly understand that Iraq received a disproportionate amount of the aid provided to many countries.
Do you think I made up these reasons? Then read this report by RAND that investigated graph comprehension using experimental methods. Here is a snippet from the abstract:
We investigated whether the type of data display (bar chart, pie chart, or table) or adding a gratuitous third dimension (shading to give the illusion of depth) affects the accuracy of answers of questions about the data. We conducted a randomized experiment with 897 members of the American Life Panel, a nationally representative US web survey panel. We found that displaying data in a table lead [sic] to more accurate answers than the choice of bar charts or pie charts. Adding a gratuitous third dimension had no effect on the accuracy of the answers for the bar chart and a small but significant negative effect for the pie chart.
There you have it—empirical evidence that it is time to retire the pie chart.
Alas, I doubt that the NYT Magazine, infographic designers, data viz junkies, or anyone with a reporting deadline will do that. As every evaluator knows, it is far easier to present empirical evidence than respond to it.
Evaluation in the Post-Data Age: What Evaluators Can Learn from the 2012 Presidential Election
Stop me if you’ve heard this one before. An evaluator uses data to assess the effectiveness of a program, arrives at a well-reasoned but disappointing conclusion, and finds that the conclusion is not embraced—perhaps ignored or even rejected—by those with a stake in the program.
People—even evaluators—have difficulty accepting new information if it contradicts their beliefs, desires, or interests. It’s unavoidable. When faced with empirical evidence, however, most people will open their minds. At least that has been my experience.
During the presidential election, reluctance to embrace empirical evidence was virtually universal. I began to wonder—had we entered the post-data age?
The human race creates an astonishing amount of data—2.5 quintillion bytes of data per day. In the last two years, we created 90% of all data created throughout human history.
In that time, I suspect that we also engaged in more denial and distortion of data than in all human history.
The election was a particularly bad time for data and the people who love them—but there was a bright spot.
On election day I boarded a plane for London (after voting, of course). Although I had no access to news reports during the flight, I already knew the result—President Obama had about an 84% chance of winning reelection. When I stepped off the plane, I learned he had indeed won. No surprise.
How could I be so certain of the result when the election was hailed as too close to call? I read the FiveThiryEight blog, that’s how. By using data—every available, well-implemented poll—and a strong statistical model, Nate Silver was able to produce a highly credible estimate of the likelihood that one or the other candidate would win.
Most importantly, the estimate did not depend on the analysts’—or anyone’s—desires regarding the outcome of the election.
Although this first-rate work was available to all, television and print news was dominated by unsophisticated analysis of poll data. How often were the results of an individual poll—one data point—presented in a provocative way and its implications debated for as long as breath and column inches could sustain?
Isn’t this the way that we interpret evaluations?
News agencies were looking for the story. The advocates for each candidate were telling their stories. Nothing wrong with that. But when stories shape the particular bits of data that are presented to the public, rather than all of the data being used to shape the story, I fear that the post-data age is already upon us.
Are evaluators expected to do the same when they are asked to tell a program’s story?
It has become acceptable to use data poorly or opportunistically while asserting that our conclusions are data driven. All the while, much stronger conclusions based on better data and data analysis are all around us.
Do evaluators promote similar behavior when we insist that all forms of evaluation can improve data-driven decision making?
The New York Times reported that on election night one commentator, with a sizable stake in the outcome, was unable to accept that actual voting data were valid because they contradicted the story he wanted to tell.
He was already living in the post-data age. Are we?
6 Comments
Filed under Commentary, Evaluation, Evaluation Quality, Program Evaluation
Tagged as evaluation, evaluations, FiveThirtyEight Blog, How to Lie with Statistics, Nate Silver, obama, Program Evaluation