Friday, August 20, 2004

Marginalia

The issue of how to interpret statistical margins of error seems to come up on various blogs all the time, the latest being this post from Matthew Yglesias. It seems to be very difficult, even for intelligent people, to understand why it is not quite right to say things like "there's a 95 percent chance that Kerry is in the lead." Perhaps the easiest way to see why this is wrong is to think about how you would interpret multiple polls.

Suppose you take two polls on the same day with two different samples, asking each person if they plan to vote for Kerry. You are interested in knowing the true percentage of people in the sampled population who would say they are planning to vote for Kerry on that day. Let's say in your first poll, 53% of the people say they are going to vote for Kerry, and in the second poll 47% say they are going to vote for Kerry. If these polls have a 3% margin of error, then the first poll gives you a 95% confidence interval of [50%,56%], and the second one gives you a confidence 95% interval of [44%,50%]. It would be unusual for the polls to come out that far apart, but these things can happen with random sampling (just like you sometimes might toss a coin and get 8 straight heads.)

Now, how do you interpret the results? It makes no sense, from either a Bayesian or frequentist perspective, to say both that there is a 95% chance that Kerry is in the lead, and there is a 95% chance that Kerry is behind. What could such a statement possibly mean?

The correct (although a bit cumbersome) interpretation is to say that both intervals are "constructed using a procedure such that the interval will contain the true value 95% of the time." In other words, if we were to take a million samples and construct confidence intervals for each, we would expect about 950,000 of the intervals to contain the true K, whatever that might be.

In this example we can see that at least one of the polls failed to contain the true value, so we know we've observed a somewhat unusual sample (or two). But the correct interpretation still makes sense, while the incorrect one doesn't.

(BTW, I agree with Matt that statements about "a statistical tie" are usually hogwash.)

0 Comments:

Post a Comment

<< Home