Thursday, September 25, 2008

Figures don't lie, but liars can figure #2

As I've mentioned on this blog before, my dad is fond of saying, "Figures don't lie, but liars can figure." He's pointing out that it is often possible to twist numbers so they seem to support your point of view. You shouldn't just accept statistics as valid just because they sound official and scientific.

Here's a really good example: the current presidential polls. This is a really good post looking at the last five weeks of the Gallup tracking poll. Gallup releases the total numbers, and then also breaks down the numbers by party identification (liberal dem, moderate dem, independent, liberal repub, etc.) The alert blogger at Wizbang, DJ Drummond, looked at these party ID numbers and did a bunch of math and concluded (emphasis in the original):
So, put it all together, and in the past week Obama has stayed steady or lost support in every party identification group, yet Gallup says his overall support went up four points. And McCain stayed steady or went up in every party identification group, yet we are supposed to accept the claim that his overall support went down by four points? Anyone have an answer for how that is even possible?

Well, actually I do. There is one, and only one, possible way that such a thing can happen mathematically. And that way, is that Gallup made major changes to the political affiliation weighting from the last week to now. Gallup has significantly increased the proportional weight of Democrat response and reduced the weight of Republican response.
People do change party ID over time. But we're talking about something like an 8% swing in one week for this poll to be accurate. I think that's pretty unlikely. In the past 10 years there's never been more than a 2% change in those identifying as Democrats between elections (the biggest Republican swing is 3%). And that is over a period of 2 years, not one week. Basically, the pollsters are trying to guess at the electorate's party ID, and they all guess differently, and they change their guesses week-to-week.

That's not the only reason you should take polls with a grain of salt. For starters, you may remember that both polls before the election and exit polls of actual voters overstated Democratic support in 2000 and 2004. I'm not sure why this is true--perhaps it is tied to weighting by party ID! But it is historical evidence of the problem.

You should also know about the registered voter/likely voter issue. We know that everyone who is registered to vote doesn't actually vote each year. So pollsters ask some questions to try to guess who is a "likely voter" and then release numbers based on their choice for president. But, really, these are at best educated guesses. This year, the guesses are confused even more because Obama is a candidate with particular appeal to the youth vote (under 25); this group traditionally has very low turnout. No one can know in advance who will vote. When you look at a poll, it's important to check whether it is of registered voters or likely voters. If it's registered voters, it may be overstating the Democratic vote, based on historical turnout patterns; if it's likely voters, it might be influenced by the pollster's guesses--possibly biased--about who will actually vote.

There are a whole bunch of other things that can influence a poll. For example, pollsters are starting to get worried about cell phones, which don't get called for most polls. If people have only cell phones and no landline, they can't be in the survey. If these people all tend to vote in one particular way, it could bias the results. A new Pew Report suggests that this might make a 2-3% difference in the results. Another factor can be the age and gender of the interviewer, especially for in person interviews such as exit polls on the day of the election. Inexperienced interviewers might also have trouble getting enough respondents to cooperate.

Finally, pay attention to the margin of error. Most polls of 1000 or so people have margins of error around 2.5-3%. Remember, this applies to both candidates. So, if the number for McCain is 47% with a 3% margin of error, then the real number is assumed to be 44-50%. Likewise for Obama. If the spread between the candidates is less than double the margin of error, it is really showing a race that is too close to call. Also remember that, if the whole poll has a margin of error of 3%, if it then breaks down different groups (men, women, by race, from a particular state), the margin of error for those subgroups will be larger than 3%.

So that's some stuff I know about surveys, from my job and from a lot of reading about this stuff. Bottom line, don't just read a headline about a poll and assume that it is honest truth! The poll that counts is the actual vote.

If you're a statistical geek like me, and you want more info on this topic, you might want to check out pollster.com. For exit polls in particular, this pollster.com page is a good place to start, with lots of links.

Labels: , , ,

Hollywood and God Roe IQ Test
ProLifeBlogs