# Mathematics of Elections: Polling

Since the internet is overflowing with politics these days,we should take a minute to talk about some of the mathematics of elections.  We encourage everyone to vote and to take an interest in the issues, but we’ll steer clear of all of that here.  Comments are welcome as always, but we’ll moderate away any which stray too far.

There are three kinds of lies: lies, damned lies, and statistics. — Benjamin Disraeli

In this post we’ll talk a little about the mathematics of polling.  Of course, everybody knows about polls.  Hundreds or thousands of people are contacted and asked to answer a set of questions.  Based on a statistical analysis of their answers, the results are published saying things like:

“People like vanilla ice cream more than chocolate ice cream, 63% to 21%, with an error of plus/minus 3%”

What does this mean?

Lets say you’ve selected 1000 people who accurately reflect the adults of the US and they were polled with the question in this example.  The above result means 630 of them said they like vanilla, 210 said chocolate, and the rest said something else or declined to answer the question.  Now imagine you asked everyone in the US the same question and P% of them said they prefer vanilla.  The above poll tells us that

$60=63-3 \leq P \leq 63+3 = 66.$

So somewhere between 60% and 66% of American adults prefer vanilla ice cream.

Actually we should be even more careful.  A poll comes with some confidence level.  Unless stated otherwise, a poll usually comes with a 95% confidence level.  Let’s say our poll has a 95% confidence level.

What the poll really tells us is that there is better than 95% chance that between 60% and 66% of American adults prefer vanilla ice cream.

There is always some chance that a poll has selected people who excessively like vanilla ice cream purely by chance.  You can imagine that if the poll was only of 10 people, then the odds aren’t bad that you could pick a few to many vanilla lovers and skew the results.  This becomes less and less likely as you poll more and more people.  Alternatively, you can increase the error.  For example, if you change the error to plus/minus 5%, then the pollsters are now claiming that P is between 58% and 68%.  And it’s more likely that the true value is in this wider window.

On his blog, Terence Tao discusses the mathematics of how one verify the confidence level and margin of error even when you only poll the very tiny 1000/300,000,000 = .0000033 (ie. .00033%) of the population.

In any case, when you read a poll you should really say to yourself that the odds are very good that the true value is somewhere in the range given by the stated value plus/minus the error.   For example, very recently the University of Cincinnati polled likely voters in Ohio.

University of Cincinnati Ohio General Election

• John McCain 48%
• Barack Obama 46%

Which presidential candidate will do the best job of improving our economy?

• Obama 47%
• McCain 44%

Survey of 876 likely voters was conducted October 4-8. The margin of error is +/- 3 percentage points.

Now that we know how to read a poll, we see that the pollsters have interviewed 876 likely voters and assert that there is better than a 95% chance that if the election were held on October 4-8, then McCain would receive somewhere between 45% and 51% of the vote and Obama would receive somewhere between 43% and 49% of the vote. In particular, notice that these intervals overlap by quite a bit!  So, based on this poll, it is reasonably possible that Obama would beat McCain in the election!  Of course, it’s significantly more likely that McCain would win, but it’s important to realize this poll is not saying that McCain would beat Obama by 2% in Ohio!

Besides the inaccuracy the math can warn you about using the margin of error and confidence interval, there is plenty of room for human error.  More on that below.

A festivus pole

Well, somebody is doing the poll and somebody is paying for the poll.  Often intentional or unintentional biases sneak in.  What are some of the ways this happens?

First of all, how you select the people to poll is important.  A poll is supposed to represent the opinions of some group of people.  Most polls about the presidential election are designed to represent the opinion of people likely to vote in the election.  So when people are selected  for a presidential poll, people in France, children, etc. are excluded.  For the presidential election it’s reasonable to say that we don’t want to include the opinions of French people because the poll is meant to help predict who might be elected and the French won’t be voting in the election.  So first you need to know what population of people is meant to be represented by the poll.

Maybe in our ice cream example the poll is meant to be of adults living in the US.  In which case a cross section of adults in the US should be represented in the people polled.  So men and women, various regions of the country, older and younger people, etc. should all be represented in the group polled.  This is very hard to do.  For example, if you conduct your poll by phone then automatically your biasing your poll towards people who own home phones.  People who don’t have a phone, or only use a cell phone, or only an internet based phone, would all be excluded.  It’s easy to skew the population of people you’re polling without even knowing it!

Second, how you phrase questions has a big affect.  For example, the online gambling industry did a poll in 2006 on whether or not the government should regulate online gambling.  Even before you look at the poll, you can bet that they have ideas about what outcome from the poll they want.  An example question:

Q:     Many gambling experts believe that Internet gambling will continue no matter what the government does to try to stop it. Do you agree or disagree that the federal government should allocate government resources and spend taxpayer money trying to stop adult Americans from gambling online?

 Agree 11% Disagree 77% Not sure 12%

When you phrase the question that way, it’s no surprise that you get that outcome! Another example of this is described here.  By the way, both examples are by Zogby’s International polling group.  Makes you wonder about their integrity!

Third, another way to influence a poll is by the order of the questions.  Here is an example from here in a poll from 2007 by Fox News:

39. Who do you trust more to decide when U.S. troops should leave Iraq — U.S. military commanders or Members of Congress?

69% Commanders
18% Congress
7% (Both)
3% (Neither)
3% (Don’t know)

40. Last week the U.S. House voted to remove U.S. troops from Iraq by no later than September 2008 — would you describe this as a correct and good decision or a dangerous and bad decision?

44% Correct and good
11% (Don’t know)

If you only looked at question 40 you may think this is a reasonable poll question (even that is questionable!), but it’s pretty hard to imagine that question 39 didn’t have at least an unconscious affect on peoples’ answer to question 40.

Of course, these sorts of problems show up in all sorts of polls.  And even when the pollsters are being super careful, there are subtle influences affecting the poll.

Moral:  No poll is completely unbiased!

At the very least when you look at a poll, as a math person you should check the sample size, margin of error, and confidence level and think about what that tells you.  Then look at the who did the poll and what questions they asked to see what influence that might have in the results.  A summary of things to look for in a poll is given here.

One way to increase the accuracy of your information is to lump a bunch of polls together.  It’s a little like doing one big poll.  Websites such as realclearpolitics.com and electoral-vote.com do exactly this.  Other websites like fivethirtyeight.com do thousands of computer simulations of elections based on the polling data and use that to make predictions.  However, different polls ask different questions, use different methods to select people to poll, and may even be intentionally biased.   So the websites have to make decisions about which polls to include or exclude.  So even these websites can have biases!

## 7 thoughts on “Mathematics of Elections: Polling”

1. Thanks for posting the article, was certainly a great read!

2. Great information in regard to polling. My stats professor was very quick to point out the info in regard to the often unmentioned “margin of error” that many TV hosts rarely elucidate upon. On another note, I read about an interesting survey that evaluated the same primary polls used to measure Obama/ McCain and Kerry/ Bush in 2004. Interestingly enough, this study found Kerry to lead Bush at virtually the same margin as Obama now leads McCain. I am not quite sure if this has any real predictive power when it comes to the election winner, but I found it interesting nonetheless.

3. I find this article very informative, also. When I see reportings of polls on the television, I used to take them for what the reporters told me they were telling me. In my statistics class, we’ve been relating some of the polling to our topics, and one of them was that the polls do not always give a good representation because there is always some biased somewhere. As I get deeper into the subject of statistics, I am beginning to question what all of these polls are telling me. I now always want to ask the questions of how the polling took place, who were involved, how were they selected, etc. I feel this is a very important subject that everyone, not just people who enjoy math, should try to understand so everyone doesn’t just take the polls for their appearance because they sometimes are misleading.

4. This is very interesting. I did not understand these types of polls very well before reading this article. It now makes me question the accuracy of polls. I assumed that polls can be biased very easily, but I never thought that every poll is biased in some way. Also, it is easy for the people who are doing the poll to bias against anyone. Now that I know this, I understand how difficult it is for people to make a poll as unbiased as possible. Now, I will look to see what the sample size is and what the margin of error is in a poll to see if the poll is truly representative of the population.

5. This article is very important to understand, especially at a time like this when the election is in its final stretch. Usually I would take a poll for what it was worth, but now seeing how much the accuracy could be off, I will question these polls from now on. With the election only 12 days away, there are numerous polls on all the news channels such as fox news, cnn, msnbc, and much more. For example, there was a recent poll I found on foxnews.com that was from Quinnipiac University on the 3 big battleground states. The results are below.
Florida: Obama 49, McCain 44
Ohio: Obama 52, McCain 38
Pennsylvania: Obama 53, McCain 40
The margin of error is 2.6 to 2.7 points. This means that all of these states could be much closer or farther apart. Florida, for example, could be a dead heat with 46 each. From this news article, I have learned to make sure to check the sample size and margin of error to see how accurate a poll is. This article was very interesting, and America should be aware of this information to make sure they don’t always take a poll for what its worth.

6. It is interesting how the margin of error and level of confidence create a layer of probability behind the statistics. It just goes to show that nothing is ever exactly as it seems.

It is also interesting how psychology can be used in different ways, depending upon the intent of the user. Research psychologists use it to reduce bias and obtain more authentic data, but some of the example polls above clearly use it to bias the data. The irony is that such biasing effects were probably confirmed through research psychology, so the (ideally) unbiased research psychology was used to develop techniques used to skew data based on bias.

7. This certainly sheds light on how polls taken merely hours apart can vary so much. Jaded views and bias are fueled by politics, and in an effort to prove points, the truth rarely comes out. Even in the debates, it wasn’t clear as to what each politician’s points were as they focused on establishing the opponent as the weaker of the two.
I love how math can prove political fallacies like this.