Tag Archives: How to Lie with Statistics

Evaluation in the Post-Data Age: What Evaluators Can Learn from the 2012 Presidential Election

Stop me if you’ve heard this one before.  An evaluator uses data to assess the effectiveness of a program, arrives at a well-reasoned but disappointing conclusion, and finds that the conclusion is not embraced—perhaps ignored or even rejected—by those with a stake in the program.

People—even evaluators—have difficulty accepting new information if it contradicts their beliefs, desires, or interests.  It’s unavoidable.  When faced with empirical evidence, however, most people will open their minds.  At least that has been my experience.

During the presidential election, reluctance to embrace empirical evidence was virtually universal.  I began to wonder—had we entered the post-data age?

The human race creates an astonishing amount of data—2.5 quintillion bytes of data per day.  In the last two years, we created 90% of all data created throughout human history.

In that time, I suspect that we also engaged in more denial and distortion of data than in all human history.

The election was a particularly bad time for data and the people who love them—but there was a bright spot.

On election day I boarded a plane for London (after voting, of course).  Although I had no access to news reports during the flight, I already knew the result—President Obama had about an 84% chance of winning reelection.  When I stepped off the plane, I learned he had indeed won.  No surprise.

How could I be so certain of the result when the election was hailed as too close to call?  I read the FiveThiryEight blog, that’s how.  By using data—every available, well-implemented poll—and a strong statistical model, Nate Silver was able to produce a highly credible estimate of the likelihood that one or the other candidate would win.

Most importantly, the estimate did not depend on the analysts’—or anyone’s—desires regarding the outcome of the election.

Although this first-rate work was available to all, television and print news was dominated by unsophisticated analysis of poll data.  How often were the results of an individual poll—one data point—presented in a provocative way and its implications debated for as long as breath and column inches could sustain?

Isn’t this the way that we interpret evaluations?

News agencies were looking for the story.  The advocates for each candidate were telling their stories.  Nothing wrong with that.  But when stories shape the particular bits of data that are presented to the public, rather than all of the data being used to shape the story, I fear that the post-data age is already upon us.

Are evaluators expected to do the same when they are asked to tell a program’s story?

It has become acceptable to use data poorly or opportunistically while asserting that our conclusions are data driven.  All the while, much stronger conclusions based on better data and data analysis are all around us.

Do evaluators promote similar behavior when we insist that all forms of evaluation can improve data-driven decision making?

The New York Times reported that on election night one commentator, with a sizable stake in the outcome, was unable to accept that actual voting data were valid because they contradicted the story he wanted to tell.

He was already living in the post-data age.  Are we?

6 Comments

Filed under Commentary, Evaluation, Evaluation Quality, Program Evaluation

Should the Pie Chart Be Retired?

The ability to create and interpret visual representations has been an important part of the human experience since we began drawing on cave walls at Chauvet.

Today, that ability—what I call visualcy—has even greater importance.  We use visuals to discover how the world works, communicate our discoveries, plan efforts to improve the world, and document the success of our efforts.

In short, visualcy affects every aspect of program design and evaluation.

The evolution of our common visual language, sadly, has been shaped by the default settings of popular software, the norms of the conference room, and the desire to attract attention.  It is not a language constructed to advance our greater purposes.  In fact, much of our common language works against our greater purposes.

An example of a counterproductive element of our visual language is the pie chart.

Consider this curious example from the New York Times Magazine (1/15/2012).

This pie chart has a humble purpose—summarize reader responses to an article on obesity in the US.  It failed that purpose stunningly.  Here are some reasons why.

(1) Three-dimensionality reduces accuracy: Not only are 3-D graphs harder to read accurately, but popular software can construct them inaccurately.  The problem—for eye and machine—arises from the translation of values in 1-D or 2-D space into values in 3-D space.  This is a substantial problem with pie charts (imagine computing the area of a pie slice while taking its 3-D perspective into account) as well as other types of graph.  Read Stephanie Evergreen’s blog post on the perils the 3-D to see a good example.

(2) Pie charts impede comparisons: People have trouble comparing pie slices by eye.  Think you can? Here is a simple pie chart I constructed from the data in the NYT Magazine graph.  Which slice is larger—orange or the blue?

This is much clearer.

Note that the the the Y axis ranges from 0% to 100%.  That is what makes the bar chart a substitute for the pie chart.  Sometimes the Y axis is truncated innocently to save column inches or intentionally to create a false impression, like this:

Differences are exaggerated and large values seem to be closer to 100% than they really are.  Don’t do this.

(3) The visual theme is distracting: I suspect the NYT Magazine graph is intended to look like some sort of food.  Pieces of a pie? Cake? Cheese?  It doesn’t work.  This does.

Unless you are evaluating the Pillsbury Bake-Off, however, it is probably not an appropriate theme.

(4) Visual differentiators add noise: Graphs must often differentiate elements. A classic example is differentiating treatment and control group averages using bars of different colors.  In the NYT Magazine pie chart, the poor choice of busy patterns makes it very difficult to differentiate one piece of the pie from another.  The visual chaos is reminiscent of the results of a “poll” of Iraqi voters presented by the Daily Show in which a very large number of parties purportedly held almost equal levels of support.

(5) Data labels add more noise: Data labels can increase clarity.  In this case, however, the swarm of curved arrows connecting labels to pieces of the pie adds to the visual chaos.  Even this tangle of labels is better because readers instantly understand that Iraq received a disproportionate amount of the aid provided to many countries.

Do you think I made up these reasons?   Then read this report by RAND that investigated graph comprehension using experimental methods.  Here is a snippet from the abstract:

We investigated whether the type of data display (bar chart, pie chart, or table) or adding a gratuitous third dimension (shading to give the illusion of depth) affects the accuracy of answers of questions about the data. We conducted a randomized experiment with 897 members of the American Life Panel, a nationally representative US web survey panel. We found that displaying data in a table lead [sic] to more accurate answers than the choice of bar charts or pie charts. Adding a gratuitous third dimension had no effect on the accuracy of the answers for the bar chart and a small but significant negative effect for the pie chart.

There you have it—empirical evidence that it is time to retire the pie chart.

Alas, I doubt that the NYT Magazine, infographic designers, data viz junkies, or anyone with a reporting deadline will do that.  As every evaluator knows, it is far easier to present empirical evidence than respond to it.

5 Comments

Filed under Design, Evaluation, Visualcy

Tragic Graphic: The Wall Street Journal Lies with Statistics?

Believe it or not, the Wall Street Journal provides another example of an inaccurate circular graph.  This time the error so closely parallels an example from Darrell Huff’s classic How to Lie with Statistics that I find myself wondering—intentional deception or innocent blunder?

The image above comes from Huff’s book.  The moneybag on the left represents the average weekly salary of carpenters in the fictional country of Rotundia.  The bag on the right, the average weekly salary of carpenters in the US.

Based on the graph, how much more do carpenters in the US earn?  Twice?  Three times?  Four times?  More?

The correct answer is that they earn twice as much, but the graph gives the impression that the difference is greater than that.  The heights of the bags are proportionally correct but their areas are not.  Because we tend to focus on the areas of shapes, graphics like this can easily mislead readers.

Misleading the reader, of course, was Huff’s intention.  As he put it:

…I want you to infer something, to come away with an exaggerated impression, but I don’t want to be caught at my tricks.

What were the intentions of the Wall Street Journal this Saturday (1/21/2012) when it previewed Charles Murray’s new book Coming Apart?

In the published preview, Murray made a highly qualified claim—the median family income across 14 of the most elite places to live in 1960 rose from $84,000 in 1960 to $163,000 in 2000, after adjusting incomes to reflect today’s purchasing power.

Those cumbersome qualifications take the oomph right out of the claim.  Too long to be a provocative sound bite, the Journal refashioned it into a provocative sight bite.  Wow, those incomes really grew!

But not as much as the graph suggests.  The text states that the median salary just about doubled.  The picture indicates that it quadrupled.  It’s Huff’s moneybag trick—even down to the relative proportion of  salaries!

Here is a comparison of the inaccurate graph with an accurate version I constructed.  The accurate graph is far less provocative.

As a rule, the areas of circles are difficult for people to compare by eye.  In fact, using the area of any two-dimensional shape to represent one-dimensional data is probably a bad idea.  Not only do interpretations vary depending on the shape that is used, but they vary depending on the relative placement of the shapes.

To illustrate these points, here are six alternative representations of Murray’s data.  Which, if any, are lies?

3 Comments

Filed under Design, Visualcy

Tragic Graphic: The New York Times Checks Facts, Not Math

Over my morning coffee, I found myself staring at this bulldog graph in the New York Times Magazine (12/11/11).  Something was wrong.  At first I couldn’t put my finger on it.  Then it hit me—the relative size of the two bulldogs couldn’t possibly be correct.

I did a little forensic data analysis (that is, I used a ruler to measure the bodies of the bulldogs and for each computed the area of an ellipse—it turns out that geometry is useful).  As I suspected, the area of the bulldog on the right is too small.  If the bulldog on the left represents 58% of responses, then the bulldog on the right represents only about 30%.  Oops.

Here is how large the bulldog should be.  Quite a difference.

The heights of the bulldogs, as they originally appeared, were proportionally correct.  That made me wonder.  If you change the size of an image using most software, it changes the length and width proportionally—not the area.  Is that how the error was made?

The Times wouldn’t make a mistake like that, I reasoned.  Maybe the image was supposed to be a stylized bar graph (but in that case the width of the images should have remained constant).  In any event, the graph addressed a trivial topic.  I went back to my coffee confident in my belief that the New York Times would never make such a blunder on a important issue.

Then I turned the page and found this.

The same mistake.  This time the graph was related to an article on US banking policy, hardly a trivial topic.  The author wanted to impress upon the reader that a few banks control a substantial share of the market.  A pity the image shows the market shares to be far smaller than they really are.

The image below illustrates the nature of the error—confusing proportional change in diameter for a proportional change in the area of a circle.

Below is a comparison of the original inaccurate graph and an accurate revision that I quickly constructed.  Note that the area of the original red circle represents a market share of just under 20%—less than half its nominal value.

And here is an alternative graphic using squares instead of circles.  A side-by-side presentation of Venn diagrams may allow readers to compare the relative size of market shares more easily than overlaid shapes.

I did a bit of digging and found another person noticed the error and pointed it out to the Times online (you really need to dig to find the comment).  To date, the inaccurate graph is still on the Times website and I have not found a printed correction.

Apparently, checking facts does not include checking math.

14 Comments

Filed under Design, Visualcy