05 November 2005

Why Statistics Matter

Big Or Confusing Numbers Require Statistics

Picture of part of the Million Man March, by Smithsonian Institution photographer, from http://photo2.si.edu/mmm/mmm2.htmlWe are faced every day with oceans of facts and figures. It is impossible to consider each fact individually, so we use "statistics" to deal with these piles of numbers. "Statistics" are numbers that describe, or summarize, groups of other numbers. The study of this type of analysis and description of unmanageable bunches of data is called "Statistics". How many people attended the Million Man March? (More on crowd numbers at the bottom of this post.)

Statistics Help Us See Patterns

Sometimes these patterns, the conclusions we derive from the raw information, are important. For example:

Bad Statistics = Bad Decisions

Statistics that are used improperly or misleadingly can cause you to misinterpret the underlying data, leading to bad decisions. (The examples below assume nobody is actually lying. Of course a lot of figures you read are just completely fake, but nobody has bothered to verify them.)

Example 1

Suppose you are listening to three political candidates, and you want to vote for the one which is most likely to work to preserve the environment. Candidate Able says she voted for green legislation 20 times in her last term in office. Candidate Baker says he voted for 80% of the green bills that were proposed during his last term. Candidate Charlie says she has voted for more green legislation than either Able or Baker.

Before you vote you might want to know that:
  • Although Able voted green 20 times, she voted against green legislation 100 times. She neglects to mention this.

  • Although Baker voted for 80% of the green bills proposed, he voted against the most important and significant bills. He has padded his figures with many minor measures that might be considered environmental.

  • Candidate Charlie has been in the legislature for much longer than either Able or Baker. In her earlier terms she voted for many pieces of green legislation, but more recently she has voted against all green measures.
Better keep looking for a candidate friendly to the environment.

Example 2

The average pay at Company A is higher than the average pay at Company B. Which would you rather work for? Before you answer consider that the "average" can be misleading. The CEO at Company A makes ten times the salary of the CEO at Company B, thus "raising the average". All the other workers at Company A earn less than their counterparts at Company B.

So unless you are going to be CEO, you will get paid more at Company B.

If You Don't Understand Statistics, You Can't Spot Bad Statistics

Statistics are widely used in newsmedia, in government reports, and in many other information sources. The purpose should be to make the raw information easier to understand, but often misuse of statistics (sometimes deliberate, sometimes incompetent) causes misinformation or confusion.

Three things to keep in mind when you see statistics or other numbers in media articles or web sites:
  • Reporters and their editors believe people like to see "facts" and figures, so they try to find some to put in.

  • Reporters (like most other) people don't have a clue about statistics.

  • Reporters and most other writers are on a deadline.
Therefor it is up to you to ask:
  • What is the source of that number?

  • How certain is that number? What is the range of uncertainty?

  • What (possibly confused) calculations were used to arrive at that number?

Examples: Crowd Numbers

Aerial photo of Million Man March from http://observe.arc.nasa.gov/nasa/exhibits/march/March_2.htmlNews stories about demonstrations or other events often include numbers representing the size of the crowd. Nobody actually enumerated the crowd, counting each member, so such numbers are always estimates.
  • What is the source of the estimate? (Consider possible bias.)

  • What method was used? (Each has its pros and cons.)

  • Were the raw data further manipulated? (For example by averaging.)
Here is a good article on crowd estimation. Here is another. Crowds at events in New York City have been estimated by the quantity of garbage they leave behind.

"Counting the March" is an excellent site about using aerial imaging to count the Million Man March.

Additional Resources

Robert Niles's site on statistics for journalists. Excellent.

A good discussion of misinterpretation of statistics by the media, from Statistics Canada.

Another good site on importance of proper use of statistics

Numberwatch -- "All about the scares, scams, junk, panics, and flummery cooked up by the media, politicians, bureaucrats, so-called scientists and others who try to confuse you with wrong numbers."

At "stats", "We check out the facts and figures behind the news," and unsurprisingly, often find them misleading or wrong.

Nice article on innumeracy among journalists.
David Wheat's Science In Action site has articles about science and math in the real world, weird science, science news, unexpected connections, and other cool science stuff. There is an index of the articles by topic here.

tags: , , , , , ,

1 comment:

Char Paul said...

an excellent resource~ when i've time i am adding you to my Watercooler page, meanwhile to link to you in a post...