Electoral College Meta-Analysis (election.princeton.edu)

From Prof. Sam Wang of Princeton University.

This page is available online at http://synapse.princeton.edu/~sam/pollcalc.html

Below is a meta-analysis directed at the question of who would win the Electoral College if the election were held today. Meta-analysis provides more objectivity and precision than looking at one or a few polls, and in the case of election prediction gives a more accurate current snapshot. Many of you agree - this site gets tens of thousands of visitors (site meter) every day. Backup site here.

These calculations are based on all available state polls, with an emphasis on likely voter data that include Nader where he is on the ballot. Three or more recent polls (up to seven days old) for each state are averaged and the standard error of the mean is used to calculate the probability of every combination of possible state results. The map is not identical to the median; for an explanation read here. Results are defined as not statistically significant (n.s.) if the probability is less than between 5% and 95%. The effects of turnout are not included, but can be calculated using the bias analysis. See below for an explanation of the methods. Your polling-related comments are welcome.

I do not take donations. If you would like to express your support, you are welcome to do so politically through ActBlue. Republicans may donate through the National Republican Senate Committee. You can also use this site to help you decide where to hit the streets to get out the vote.

Friday, October 29, 2004 at 10:00AM EDT (commentary only)

Predicted median with undecideds: Kerry 294 EV, Bush 244 EV (probability map)

Median outcome, decided voters only: Kerry 274 EV, Bush 264 EV (probability map) (Trends to 10/12)

95% confidence interval, decided voters only: +/-36 EV for each candidate (Kerry >=270EV: 61%, n.s.)

Popular Meta-Margin among decided voters (explanation): Kerry leads Bush by 0.3%

Commentary (Recent comments)

Friday 10:00AM: Tune in to FOX News on Saturday (tomorrow) around 2:45PM, when I am scheduled to talk about the meta-analysis.

I am traveling today until evening, so the next update may be late in coming. If nothing shows up in the next four hours, check back late tonight. I also hope to have the briefing sheet I promised before, but other events have slowed me down a bit and I might not have a good version until Sunday. In the meantime, the fastest state poll updates can be found at race2004.net and The Hedgehog Report.

Thursday 8:15PM: More letters.

Thursday 7:00PM: Now that so many polls exist in key states (for instance 13 in Florida) I can finally switch from mean to median. The result is still essentially a tie. However, this gives improved accuracy in hotly contested states with many polls and achieves resistance to outliers in either direction. The only state to flip sign because of this change is Hawaii, which has only three polls and switches to Bush. Friday: It's been pointed out to me that a switch to median would require re-calculation of the history over time. It's also a bit late to switch methods - problematic for clear interpretation. Developing...

Today I also revise my estimate of undecided voters. Based on the two-way numbers below and assigning 2.0% to Nader/other leaves 2.0% of voters undecided. Using Cook's rule for incumbents and previous presidential results, I estimate an advantage of 1.0 ± 0.8% to Kerry, down from my previous estimate of 2.5 ± 2.0%.

New polls from Quinnipiac (FL, PA), ARG (IA, OR, WI), Strategic Vision (R) (MI, MN, NJ), and Mason-Dixon (WA). Of key polls completed in the last seven days:

Florida (13 polls): Bush leads in 6, Kerry leads in 3, and 4 ties (median ± estimated SEM, tie 0.0 ± 0.8%).

Ohio (9 polls): Kerry leads in 5, Bush leads in 4 (Kerry by 1.6 ± 1.3%).

Pennsylvania (11 polls): Kerry leads in 9, Bush leads in 1, and 1 tie (Kerry by 2.8 ± 0.9%).

National two-way choice (11 polls): the median result is Bush 48.5 ± 0.4%, Kerry 47.5 ± 0.7% (unchanged since yesterday). National polls come from RealClearPolitics, davidwissing.com, pollingreport.com, and yougov.com.

Thursday 3:00PM: Edward Witten writes, "On mydd.com, I read yesterday a rumor that a NYT poll of Florida showing Kerry ahead by +9 percent was buried as being implausible. I don't know if the rumor is true, and if it is I am sure the poll was flawed, as Kerry is surely not leading Florida by that amount. But to me it illustrates the fragility of trying to predict the election from the available state polls. Including or excluding a single, undoubtedly flawed, poll showing a +9 percent lead in Florida for Kerry (or Bush) would probably have a significant impact on your overall assessment of the outcome of the election." To some extent he has a point. If such a poll exists then the decided-voters moves K up to K 271 B 267, win probability 51%. The use of median rather than mean circumvents this problem a bit, which I will do soon. In any event, the confidence interval is +/-36 EV. Therefore neither result is statistically significant. Any way you slice it, the election is a toss-up and will depend on turnout and undecideds.

Thursday 9:45AM: Since posting a few letters here I have received many more. Here are some selected from yesterday's mail, October 27. Of special interest is one from Jim G. from New Hampshire, an undecided voter. He is articulate and thoughtful, and though we don't agree on a few things I strongly recommend his letter to all of you.

Evidently I spoke too soon regarding Bush giving up on Ohio. His travel plans include three stops there until Election Day. He is also pushing in Michigan, where Kerry has recently slipped a bit.

To revisit my previous calculation of where you are most effective in your door-to-door activism: the current value of a single vote in the top states is (measured in kilojerseyvotes): Hawaii 29.1, Colorado 6.1, Minnesota 4.6, Florida 4.4, and Ohio 4.3. If sentiments shift 1% towards or away from Kerry relative to polls, the top states are the same except that Minnesota is replaced by Iowa. Other values of relevance are New Hampshire 2.4, Nevada 1.9 and Pennsylvania 1.4. (Parenthetically, the jerseyvote is declining in value like a Weimar Reichsmark.)

Thursday 7:00AM: New polls from U. Arkansas (AR), Rasmussen (AZ), Fairbank and Associates (D) (CO), Zogby (CO, FL, IA, MI, MN, NM, NV, OH, PA, WI) L.A. Times (FL, OH, PA), Honolulu Star-Bulletin (HI), Mitchell Research (MI), Humphrey Institute (MN), Saint Cloud Univ. (MN), Research 2000 (R) (MO), Mason-Dixon (NC, VA), Quinnipiac (NJ), Rasmussen (NM, NV), Moore (OR), Gallup (PA), and Survey USA (WA). (Whew.) For the decided voters calculation the map and median are flipped. Such things can happen when multiple probabilities are in the 20-80% range.

Of key polls completed in the last seven days:

Florida (12 polls): Bush leads in 5, Kerry leads in 3, and 4 ties (average ± SEM, Bush by 1.3 ± 1.0%).

Ohio (9 polls): Kerry leads in 5, Bush leads in 4 (Kerry by 0.7 ± 1.2%).

Pennsylvania (10 polls): Kerry leads in 9 and 1 tie (Kerry by 2.9 ± 0.7%).

National two-way choice (10 polls): the median result is Bush 48.5 ± 0.3%, Kerry 47.0 ± 0.5%.

Wednesday 5:30PM: A brief note on the vagaries of opinion polling. When we read polls we often make the implicit assumption that people report what is really going on inside their heads. However, this is a subjective report. The famous example in these closing days is the "undecided voter." But are these people undecided in the sense that we mean colloquially? Are they one monolithic category of person?

It's been pointed out that many undecided voters are unfavorable about the incumbent, and usually break for the challenger. This phenomenon may simply reflect the fact that some people are unable or unwilling to state a set preference. To cite a homely example, you may find yourself unable to articulate what you want for dinner, but you can react immediately to what you don't want.

Recently Scott Rasmussen reported data that he says supports the notion that late-deciding voters prefer Bush. The survey was done from 136 late-deciding voters, far too few to reach statistical significance. This is a message poll aimed at driving the discussion in his preferred direction. Also, the survey assumes that the voters who decided during the survey period are similar in characteristics to those who wait until the last minute, possibly until they are standing in the voting booth. This is untested.

A parting thought on undecided voters: we are not going to resolve this by further argument! The best we can do is come up with a way to measure what they do, and wait until after the election. I will try to provide this as part of my final Election Night briefing document.

Other examples of respondent inaccuracy are the party-ID question, which can depend on when in the survey it is asked (especially if asked after the presidential preference!) and the question of who people voted for in the last election (on average, people show a tendency to think that they voted for the winner even if they did not).

Finally, once again: the probability map is not the same as the median calculation. This is why they do not match. If you were thinking about writing me to point this out, read this first.

Wednesday 12:30PM: My regular email address works once again. Send your correspondence there.

Charles Cullen writes asking today's probability of a 269-269 electoral tie. Using decided voters only it's 3.9% - a lot! With the undecideds assumption it's 0.4%. In this scenario the newly elected House and Senate would determine the president and vice-president, leading to Bush-Cheney (if the Senate remains Republican) or Bush-Edwards (if the Senate goes Democratic).

Wednesday 7:00AM: One of the pleasures of running a popular Web site is the correspondence. Click here for some selected letters from the last few days of various types - illuminating, entertaining, and unintentionally hilarious.

Wednesday 5:30AM: Hawaii has been added because of two recent polls showing possible leads for Bush. This seems very unlikely. In any event, what is really needed is a third poll.

With that, let's think about a favorite subject of mine: why individual polls seem surprising or contradictory. I can think of three reasons:

1. Reporters often don't understand statistics. A poll showing Bush up by 5% and another showing Kerry up by 1% are in fact consistent with one another because of random sampling error. For more on this read yesterday's entry by Mystery Pollster (Mark Blumenthal). A better way to get a good answer is to examine many polls at once. For the record, Charles Forelle at the WSJ is a very notable exception - in his article about this Web site and others, he captured the subject perfectly!

2. Man bites dog. When a poll's finding sounds interesting, it gets more attention than a boring result. Therefore reports of outliers tend to grab headlines, creating apparent discrepancies.

3. Competition among organizations. News organizations usually rely on their own data alone. If they do this, they cannot achieve the increased accuracy that comes from comparing multiple polls. Indeed, little incentive exists to improve accuracy, since low accuracy leads to more frequent news stories, and therefore more readers or viewers.

Tuesday 11:00AM: Statistically based analysis of the Electoral College is featured in today's Wall Street Journal. Welcome to new readers!

The overall raw polling outcome (decided voters only) is still a statistical tie. This is true even with more than 100 polls used in today's calculation. Bush has tiny leads among decided voters in Florida and Ohio, indicating that the outcome in these key states will be determined by undecided voters and turnout.

Finally, a note on national polls. In 8 national polls (two-way choice) the median result is Bush 49, Kerry 47. Assuming that methods are similar, the fact that this margin is larger than the Meta-Margin above supports the idea the distribution of support for the candidates gives Kerry a small advantage.

Saturday 10:00AM: I am working on a reference sheet to give you late next week. In addition to bottom-line predictions, this reference will give you a list of things to watch for on Election Night, along with key combinations that Kerry and Bush need. The content will change a bit in the coming week as the last polls come in. However, some outlines are now coming into view.

Under today's polling conditions, four states are clearly in serious contention: Florida, Iowa, Ohio, and Wisconsin To a lesser extent so are NV, WV, and some others. Depending on undecideds/turnout/bias, states come into or go out of play, but in those situations Kerry or Bush typically win the Electoral College by a more comfortable margin. So let's concentrate on this near-tie condition. After assigning other states as indicated by polls (PA and MI to Kerry, MO to Bush, and so on) and playing with combinations, several patterns emerge.

First - if Kerry wins Florida, the election is over - he wins. Kerry can also win by taking Ohio plus one of the smaller states. In the other direction, Bush must win not only Florida, but also either Ohio or all of the smaller states. In light of these facts, the Saletan piece (see below) indicates that the Bush campaign's actions may amount to a defensive move - otherwise why give up on Ohio?

It's also possible to identify states that look moderately solid, but might flip if the combination of undecided/turnout/bias factors adds up. This is interesting because this shift is likely to be similar across states. Therefore these states can act as an early-warning system for a surprising election night. For instance, Arkansas, North Carolina and Virginia currently look like Bush states, and Maine looks like a Kerry state. If a surprise occurs in any of these states, this might presage a significant offset between decided-voter polls and the real outcome.

Wednesday 4:00PM: I've received lots of feedback on undecided voter assignment, much of it constructive. This has led me to rearrange the way the results are presented.

First: the calculation is now set to its old definition from two days ago. Many of you are very familiar with the raw (decided voters only) calculation by now. Switching back was suggested by many readers of various political persuasions. Whether people liked the direction of the outcome or not, many were uncomfortable with the mixing of current numbers and previous election outcomes. Also, this site has many calculations that are based on decided voters only, and it only adds confusion to redo those.

Second: the assignment of undecided voters is now done probabilistically, like the rest of the calculation. Past elections from 1956 to 1996 show a wide range of undecided breaks for the challenger: [+3 +6 +2 +1 +6 +0 +2 +4], median 2.5%, estimated SD 2.0% (analysis). This year may be unusual, though note in 1972 a break of 2% away from Nixon, at the height of the Vietnam conflict and after the invasion of Cambodia. Anyway, because the contribution is variable, the undecideds-assigned calculation (MATLAB script) takes this variation into account. The results are listed in the box above. This is my own current prediction. Also, the state probabilities are now given both with decideds only and with undecideds added. Thanks to Alan Cobo-Lewis and Rachel Findley for key discussions.

Third: There are now two maps (see box). The static image below is set with undecideds assigned.

Now, to the interesting bits. Look at the state probabilities. Because the undecideds could break evenly or for the challenger, many states are still toss-ups, including Florida and Ohio. The lingering uncertainty reinforces the idea that the election is close enough to be determined by turnout. Even if the undecideds break evenly, a 2% difference in turnout could change the result drastically, which you can see in one direction by comparing the maps. Have I mentioned before that I think turnout is important? Turnout is very, very important.

Whew, that was tiring. I think I need a wee dram!

Tuesday noon: Today I implement the first major change to this calculation - I am allocating undecided voters. To do this I use past presidential election voting patterns, specifically the incumbent rule as described by Charlie Cook of the National Journal. This gives a more accurate snapshot and is a step toward making an actual prediction.

Rationale: It is known to poll analysts that voters who are undecided usually end up voting against the incumbent. In particular, compared with their final poll numbers, incumbents get between 2% less and 1% more. In contrast, challengers do better on average by 3%. These figures are consistent with Cook's estimate that undecideds split at least 75% for the challenger. In today's summary of national polls, the average Bush-Kerry split is 48.5-45.5, which sums to 94%. Assuming 2% for Nader and other candidates, the remaining undecideds are 4%. Splitting these by Cook's rule gives 1% to Bush and 3% to Kerry, reducing the margin by 2%.

Therefore, for the main calculation I will assume that the undecided-voter shift is +2.0% towards Kerry, shift state polls by this amount (using the variable already provided in the script), and proceed with the calculation. Based on state polls in Florida, Ohio and Pennsylvania, I estimate that the proportion of undecided voters in these states is similar to the national figures. Because national polls come more frequently, I will use them to calculate the shift. The size of this shift may change in the final days, and I will be monitoring this.

This new estimate is likely to be more accurate. However, it is also the first change to the calculation that is not neutral: it goes beyond the polling numbers themselves, and it is in a direction that is favorable to my candidate. For example, Florida, Iowa, Ohio and Wisconsin are still toss-ups, but they are now above the 50% probability threshold for Kerry. Therefore I will continue to report the results without this adjustment. This is listed in the box above on the line labeled "Decided voters only." The corresponding Meta-Margin can be calculated by subtracting 2.0% from the value listed.

To read more about the incumbent rule, see Charlie Cook, Guy Molyneux, the Los Angeles Times, Mark Shields, the Mystery Pollster, and a contrarian.

I have also simplified the box by removing the line about the Colorado ballot initiative, which, based on a recent poll and Salazar's opposition, seems likely to fail.

Finally, I will continue listing rankings and probabilities for all states. I have decided that there is little benefit to leaving these out.

Hitting the streets: How much do you affect the election by getting out the vote? Also, where are your efforts most valuable? To help guide your efforts, here is a synthesis of previous posts. Once you decide, I recommend contacting your local Democratic (or Republican) organization or America Coming Together.

This question can be answered by calculating how much the Electoral College win probability is changed by one person's vote. This affects where you should go because as an individual, you can only get out a finite number of votes. Today the best states to go to are Iowa, Ohio, Nevada, and Florida. Nevada, while small, is on the list because it is a near-tossup and relatively few voters per electoral vote. Here is a case study. If you are a New Jersey resident, your vote has some value, but it is low since the state is very likely to go Democratic by a substantial margin. In contrast, driving a voter to the polls in Pennsylvania is worth nearly 300 times as much. If you go to Ohio each vote is worth even more, over 500 "jerseyvotes." The top states are IA (686 jerseyvotes), OH (528), NV (508), FL (372), NM (304), WI (295), PA (295), MO (199), AR (151).

Bias analysis: The potential effects of differential turnout, splitting undecided voters, or systematic polling bias are as follows. The baseline from which bias is defined is decided voters only. Decisions by undecided voters and get-out-the-vote activities on Election Day will be major determinants of how large this bias effect is.

3 points towards Kerry: Kerry 322 EV, Bush 216 EV, Kerry win 99.97%.
2 points towards Kerry: Kerry 311 EV, Bush 227 EV, Kerry win 99%.
1 points towards Kerry: Kerry 296 EV, Bush 242 EV, Kerry win 90%.
no swing (decideds only, flat turnout): Kerry 274 EV, Bush 264 EV, Kerry win 61%.
1 points towards Bush: Kerry 257 EV, Bush 281 EV, Bush win 76%.
2 points towards Bush: Kerry 237 EV, Bush 301 EV, Bush win 95%.
3 points towards Bush: Kerry 220 EV, Bush 318 EV, Bush win 99.4%.

Although the calculation is unbiased, I am not. I am a Democrat. To see a list of races I consider critical, see my ActBlue page. My advice to all voters (including Republicans) is the same: Go to battleground states. Register voters. Make phone calls and knock on doors (a very effective strategy) to canvass for voters. Vote absentee or vote early (online resource), and on Election Day, work to get out the vote.

State-by-State Probabilities

Current percentage probabilities of a Kerry win in each battleground state, computed from the last three polls or going back seven days, whichever gives more polls. States in boldface had a new poll completed and released since the last update. The probabilities are calculated assuming that the SEM cannot go below 2%. Click on a state to view a tabulation of most of the polls. Some of the others come from these data sources (some are subscription-only). Other sources are electoral-vote.com and RealClearPolitics. All data are visible in this MATLAB script.

Decided voters only: AR 26, AZ 0, CO 69, FL 50, HI 38, IA 31, ME 96, MI 99, MN 73, MO 2, NC 0, NJ 100, NV 8, NH 94, NM 2, OH 79, OR 100, PA 92, TN 0, VA 1, WA 100, WV 8, WI 4.

Undecideds assigned: AR 44, AZ 1, CO 82, FL 68, HI 57, IA 50, ME 98, MI 99, MN 81, MO 8, NC 0, NJ 100, NV 20, NH 96, NM 8, OH 89, OR 100, PA 96, TN 0, VA 3, WA 100, WV 20, WI 11.

Rank order of the battleground states. States currently in play in the 20-80% probability range, which indicates a near-tie, are in bold.

Decided voters only: Democratic <- OR/WA/NJ/MI/ME(95-100%) / NH / PA / OH / MN / CO / FL / HI / IA / AR / WV / NV / WI/MO/NM/VA/AZ/TN/NC(0-5%) -> Republican

Undecideds assigned: Democratic <- OR/WA/NJ/MI/ME/PA/NH(95-100%) / OH / CO / MN / FL / HI / IA / AR / WV / NV / WI / MO / NM / VA/AZ/NC/TN(0-5%) -> Republican

The map below shows states that, individually, each candidate would have a greater than 50% chance of winning if the election were held today. The map is drawn with undecided voters assigned. This is a different calculation from the median EC projections above. A map cannot easily show compound events (for example, when a candidate pulls out a surprise victory in one state but loses another), but the above probability projections take compound events into account. Thus the EC totals on the map may differ from what's given above. The closer a state is to a tossup, the closer it will be to white.

Click on the map to get an interactive pop-up (thanks to Drew Thaler). The map is drawn with undecided voters assigned. For difficult browsers here is a static map.

This is a history of the calculation. For this calculation each poll is assigned to the last date on which polling was done. The marked events are inspired by a similar graph by electoral-vote.com. In my graph, the effect of events is clearer because I use polling margins and because I average over three polls. Fahrenheit 9/11, adding John Edwards to the ticket, and the Democratic convention seemed to have measurable effects within a few days. The passing of Ronald Reagan and the assault on Kerry's war heroism did not. The last update was October 12. Note: Around the time of the first debate I started using more polls per state. This and the start of Rasmussen daily tracking polls has complicated updates. Therefore after September 25 the graph is simply a record of previous daily updates - not quite the same. For instance, in this graph the bounce from the first debate looks delayed. In fact, it was immediate. This graph will be done more properly soon.

History of meta-analysis over time

Selected comments from the author (all comments)

Monday, October 25, 2004, 6:30PM

First, I apologize for the relative lack of updates and commentary! I face logistical hurdles, including my professional society's annual meeting (where I am), intermittent Internet access, and university mail system failures.

This site is mentioned in an article on polls in today's Newsday. However, there is one error - my margin of Bush over Kerry counts decided voters only, and does not include undecided voters.

I no longer list the overall probability of a Kerry win with undecideds allocated. This is because the uncertainty of how undecideds will break is accounted for state by state, but the compound probability calculation assumes independence among states, which is unlikely. The true probability is, roughly speaking, approximately equal to the probability that the undecided advantage (which I assume is 2.5 ± 2.0% for Kerry) and the Meta-margin (currently 0.5% for Bush) sum to a positive value for Kerry. Today this probability is around 75%. To take away to stat-speak, restated in English what I mean is that given the history of what undecided voters do, today I give Kerry 3-1 odds over Bush. The median EV count with undecideds assigned is still OK.

Regarding the Hawaii question, it's possible that this state is competitive but right now there are not enough data to say. Stay tuned.

Sunday, October 24, 2004, 8:00PM

A story in today's Washington Post confirms what I am suspecting: the Bush campaign is in trouble, and Bush-Cheney campaign insiders recognize this. It's consistent with the defensive move of pulling out of Ohio for a last stand in Florida.

By the way, I am having email troubles on the university server. To reach me cc your messages to mindgeek at gmail dot com.

Friday, October 22, 2004, 3:30AM

At Slate, Will Saletan points out that based on Bush's travels, his campaign may consider Florida more of a must-win state than Ohio. Looking at today's decided-only numbers, this has merit. If Bush takes Ohio his win probability is 72%, but if he loses then this drops to 40% - not even a twofold difference. However, Florida is different. If Bush wins Florida, his win probability.is 88%; if he loses, it's only 20%. To show why this is, Saletan describes electoral scenarios involving smaller states (WI, IA, NV, NM) that Bush could cobble together to make up for the loss of Ohio.

October 18, 2004

Although I don't analyze national polls, I am asked about them frequently. For instance, how to interpret the latest Gallup poll reporting a Bush-Kerry margin of 8%? My brief reply: if you look carefully at all available polls, the race is closer than this single poll indicates. Consider the following.

Imagine that the race were perfectly tied and the margin of error were 4 points. In this case six measurements of the Bush-Kerry margin could easily be: Kerry +2, Bush +2, tie, Bush +6, Kerry +6, Kerry +1. Add the fact that the CNN/USA Today/Gallup poll is somewhat favorable to Bush compared with other polls, and one can see the problems with interpreting any one poll. If a national horserace summary is required for the sake of curiosity, then looking at an average (here is more data) or a median is better. If one does this, Bush is currently about 2 points up on Kerry among decided voters.

This brings me to the biggest point of all: undecided voters are not counted in point spreads, yet history suggests that most of them vote against the incumbent. This suggests that Bush's true threshold separating victory from defeat is about 49%; he is currently slightly below that. This is the big story among pollwatchers this week. For a discussion see these L.A. Times and CNN articles.

October 15, 2004

Colorado Democratic Senate candidate Ken Salazar has come out clearly against Amendment 36, the electoral vote splitting initiative. It seems likely to fail.

I have been asked to evaluate a tactical-voting idea proposed by Nader supporters, votepair.org. The idea is for swing state voters to agree to change their Nader vote in exchange for a vote change in a non-swing state. Nader is a much smaller factor this year than in 2000, but it is still worth considering how much value you get for your trade. (This relies on the same calculation that I did on Wednesday the 13th to ask where you should get out the vote.)

Regarding Nader: Today the NY Times says that Nader is a threat to the Democrats. Possibly, but this is not supported by the data! In the nine states listed in the article, I looked at 20 polls since October 1 with both two-way results (Kerry-Bush) and three-way results (Kerry-Bush-Nader). Nader does not change the outcome in any of these - and several are ties.

October 14, 2004

The Washington Post hosts an online chat with Charlie Cook. This is excellent, a treat. He slams on the quality of polling information available on the Internet. Despite being a purveyor, I agree with him. How's that for mind-bending.

The effects of the last debate won't show up until next week. Polls typically take at least two days to complete, and pollsters usually start fresh after a big event. This is why results after the second debate are only trickling in now. In the meantime, more people think Kerry won than think Bush won the third debate, by between 1 and 14 points (CBS 39-25, ABC 42-41, CNN/Gallup 52-39, Democracy Corps 41-36). In addition to overall numbers, Kerry was favored among undecideds by 14 points (CBS), and among independents by 7 points (ABC) or 20 points (CNN/Gallup). Among independents in battleground states, where it matters most, the margin was 9 points (Democracy Corps [D]). A summary of polls for all three debates can be found here.

October 13, 2004

Lately I have been asked what-if questions (What if Bush wins Ohio? What if Kerry wins Wisconsin?) I have three ways of answering this type of question. The last answer may be of practical use in guiding your activism!

1. Flipping states: How much is the win probability affected by guaranteeing a given state? If Kerry wins Florida, then his overall win probability today jumps to 83% (five-to-one odds). If Bush wins Ohio then his win probability is 87% (eight-to-one odds).

2. Shifting the margin: What is the benefit of changing the margin by one point? You could imagine a campaign strategist making use of this to help decide where to place ads. For both candidates the three best states are Ohio, Florida and Pennsylvania. No surprises there.

3. Hitting the streets: How much do you affect the election by going somewhere to get out the vote? The way to do this calculation is to see how much the Electoral College win probability is changed by incrementing a state's margin by some fraction F, where F is inversely proportional to the state's voting population. This is because as an individual, you can only get out a finite number of votes.

Today, the best states to go to, in descending order, are: Iowa, Ohio, Nevada, and Florida. Things change a little bit if the margin is different from the estimate (for instance towards Kerry because of the incumbent rule as originated by Guy Molyneux and reviewed by the Mystery Pollster and Mark Shields), but the top four states always include Ohio and Nevada. Why Nevada? Nevada is a near-tossup and has a disproportionately high share of electoral votes.

October 5, 2004

As previously mentioned, I have been looking at the party-ID numbers in the Gallup data. I have found evidence that party ID is not fixed over time. The Gallup poll internal numbers contain the fraction of voters who call themselves Republican, Democratic, or independent. The average GOP fraction is 39%, but fluctuates. The fluctuation has been the source of much discussion and is said to be too high. As it turns out, the amount of fluctuation can be predicted from binomial statistics if the fraction of Republicans (for instance) is fixed over time. The expected standard deviation is sqrt(r*(1-r)/ N), where r is the fraction 0.39 and N is the number of people per poll, about 1000. These numbers predict a standard deviation of 1.5%. From Gallup's data, the actual standard deviation is 2.9%, almost twice this. This suggests that Gallup's way of measuring party-ID shifts over time. This supports the defense by Gallup that weighting by party ID distorts the result.

However, using unweighted data has its own problem, namely that the sample may be consistently biased in one direction or the other. In 2000, Rasmussen did not weight and predicted a margin that was 9 points more favorable to Bush than the final outcome. This is the accusation currently being made against Gallup.

But the cure may be as bad as the disease, as exemplified by Rasmussen's new approach. Rasmussen now weights, and now his presidential tracking poll fluctuates very little. Because party ID and preferred candidate (Bush/Kerry) are strongly correlated, this means that his weighting procedure will always work to reduce the margin of the leading candidate. This may explain why his poll is so stable - statistically, too stable to be right. In recent 3-day tracking data (analyzing every third day) the standard deviation of 0.7% (random fluctuation alone predicts an SD of 1.6%).

The real problem with weighting is as follows: The horserace result depends on assumptions on party ID. If these covary with sentiment, then real changes will be filtered out, and it will be very hard to learn from weighted data on who is ahead, a basic fact we want from polls. We can see an example of this today because a recent poll from Zogby shows little change from the previous poll.

Therefore I currently think that both weighting by party ID (Rasmussen now) and not weighting at all (Gallup now, Rasmussen in 2000) have serious problems. A better way to weight would be to use a question or questions with fixed answers, such as "Who did you vote for in the last election, Bush, Gore or Nader?" Time magazine does this, but does not weight. Of course, the unreliability of memory is a problem. Zogby Interactive does a sensible version of this: party ID is asked at a different time than candidate preference, which might de-link the variables. If anyone knows of other organizations that go beyond simple party-ID-weighting, please let me know.

October 2, 2004

The median electoral vote (EV) estimate is very sensitive to swings in reported opinion because of the winner-take-all mechanism of awarding EV. Under near-tie conditions I find that the change is about 30 EV per 1-point change in the popular margin. Therefore Kerry's approximately 110-EV slide since mid-August represents a 4-point swing, equivalent to 2% of voters switching from Kerry to Bush. October 5 and 9 corrections: Looking at the numbers more carefully, from August 1 to mid-September the national popular margin swung by about 8 points, 4% of voters switching. This works out to 12-15 EV gain for one candidate per 1-point change in margin or turnout. For comparison, past EV outcomes were 2000 (Bush) 271-266, 1996 (Clinton) 379-159, 1992 (Clinton) 370-168, 1988 (Bush elder) 426-111-1, 1984 (Reagan) 525-13, 1980 (Reagan) 489-49, 1976 (Carter) 297-240-1. October 9: Historically, the Electoral College margin has shown, on average, a 29 EV margin per 1% popular margin. This is consistent with my calculations this year. Since June, neither Kerry or Bush has gotten much past 320 EV, demonstrating that this is the close race that both campaigns have predicted all along.

In addition to the final polls, the outcome will ultimately be effectively adjusted by three big factors: (a) undecideds, (b) new voter registration, and (c) turnout. Undecideds usually break for the challenger, though this is not certain. Newly registered voters should in principle be reflected in polls, though how many of them pass likely-voter criteria is unclear. Turnout is a big unknown (though a known one). In 2000 Democrats did better than expected from pre-election polls. In 2002, Republicans did better. This year, unusual levels of progressive activism would seem to favor Democrats. But prediction is hard, especially of the future.

September 29, 2004

Relevant to the current Gallup controversy: Here is a table of Gallup national presidential polls, along with Party ID statistics for each poll. The GOP-Democratic margin in the poll correlates quite closely with the Bush-Kerry margin. In fact, the correlation coefficient is 0.73 (r²=0.53, P<0.001). Put into lay terms, this means that Party ID gap and Bush-Kerry margin vary together, and variation in one can account for over half the variation in the other. A linear fit between the two is near 1: on average, every extra Republican in the sample added one to Bush's margin.

This seems consistent with the idea that how much Gallup samples from each group (Republicans, Indepdendents, Democrats) affects poll outcome. But could it be the other way around: could voter sentiment affect self-reported party ID? One test of this is to see if the group sizes fluctuate as much as would be expected by chance. For a sample of 1000 voters composed of 39% R, 34% D, and 25% I (the average of all those polls), the percentage of R's would be expected to have a standard deviation of 1.5%. The actual SD is almost twice as large, 2.9%. Therefore party ID does vary more than sampling error would suggest. Maybe it varies with sentiment. Or maybe conditions at Gallup change (for instance, the time of day and week that calls were made). Hard to know. One thing is clear: their average party ID breakdown does not match known values (35% R, 38-39% D, 26% I).

Gallup Party ID bias

August 23, 2004

You can use the bias calculation to estimate where things are headed. If you think turnout will boost your candidate by N points, add that. If you think that one candidate will gain X points at the expense of the other, add 2*X. For instance, if turnout will increase Kerry's vote by 2 points, but Bush will pick up 1.5% of voters from Kerry, then the bias is 2 - (1.5 * 2) = -1%, or 1% to Bush.

August 20, 2004

For margin of error junkies: Rachel Findley pointed out this comparison of polls to election outcomes, which finds that only 84% of election outcomes fall within the reported 95% confidence interval. This discrepancy allows a way to estimate polling errors that go beyond sampling error. If the additional error is normally distributed, an appropriate correction would be to increase the reported margin of error by a factor of 1.4.

However, this correction does not apply to my calculation because instead of relying on reported MoE, I use inter-poll data to make an independent estimate of variance.

Methods

These calculations are based on state polls from many polling organizations (data sources). The primary source is Race 2004, which emphasizes likely voters (LV). Whether Nader is included depends on his ballot status in that state; if unknown then he is included. Polling organizations that provide rolling averages (Rasmussen FL, MI, MN, OH, PA) are updated twice a week. The data are fed through a MATLAB script to mathematically compute all of the above results.

The first step is to calculate the probability of winning a state, taking into account the variability of polls. This is done by calculating simple statistics on the polls: average and standard error of the mean (SEM). This is then converted to a probability of a win using the normal distribution (bell-shaped curve).

The second step of the calculation is complex: it calculates the probability of every possible outcome. For 17 states the total number of possibilities is 2^17 = 131,072. Adding Colorado, Tennessee, North Carolina and Virginia makes over 2 million possibilities. In order to reduce computing time, probabilities less than 0.1% or greater than 99.9% are classified as certain outcomes. Each possibility corresponds to a different number of electoral votes (EV).

Those are then tabulated to come up with a 50th-percentile (expected) outcome, as well as a 95-percent confidence interval. The 95-percent confidence interval is particularly useful because, like the famous Margin of Error (MoE), it gives the range of outcomes that would occur 95 percent of the time based on the available information. Note that this confidence interval is very similar to the 50th-percentile outcome from a 1-point bias towards Bush or towards Kerry. Note, October 21: The confidence interval varies in size somewhat. When large states (such as OH, WI, FL) are toss-ups the confidence band can be up to twice as large.

Although this calculation takes into account the variability of polls, it is important to note what it does not do. It is integrated over the last three polls (mostly 1-4 weeks), so fast swings do not show up. It does not reject any polls, nor does it account for potential bias or predict future opinion shift in any way.

October 6: I am now using a far faster way of calculating the probability distribution in closed form (see discussion). Thanks to Lee of Quant Consulting.

Poll selection

Polls are unfiltered and equally weighted, in part because selecting data leads to unintended biases. Therefore even though some polling organizations give demonstrable and consistent outlier results, such as state-level Fox News polls (example), Fairleigh Dickinson Public Mind, and the Badger Poll, all polls are still included. I have also included Zogby Interactive polls, which have relatively untested methods but do not show measurable bias in either direction from the average.

However, even when all of the above polls are excluded the result is virtually identical. Thus the method can be tailored but is also robust enough to give a reasonable answer even with no selection of data. For a more full discussion of my methods, see this DailyKos thread.

Bias

These calculations would be affected if there is an overall poll bias, which can have a large effect in a close race. Bias could happen if polling methods do not accurately sample actual voting patterns. However, in 2000, Ryan Lizza at The New Republic compiled state polls. On the day before the election, that compilation indicated that the outcome would hinge on Florida. This matches what happened, arguing against major built-in biases in state polls.

Other factors may have an effect of unknown size, such as increased motivation by Democrats or the possibility that undecided voters will break against the incumbent.

A key measure of the current closeness of the race is the Popular Meta-Margin (a.k.a. Swing Index). This is the across-the-board percentage shift in opinion (or poll bias) that would be needed to make the electoral college an exact toss-up? This is analogous to the popular margin in national polls, but is more relevant to what it would take in terms of real electoral mechanisms.

Bias occurs if (a) polling organizations give skewed results (on Election Eve 2000, they favored Bush by 2.5%) or if (b) one side turns its voters out better or worse than predicted. The closeness of the race for the last few months (less than 3% in either direction) indicates a heavy role for registration of new voters and election-day turnout.

Predicting the future

You can use the bias calculation to estimate where things are headed. If you think turnout efforts will boost your candidate by N points, add that. If you think that one candidate will gain X points at the expense of the other, add 2*X.

For instance, if you predict that turnout will increase Kerry's vote by 2 points, but Bush will pick up 1.5% of voters from Kerry, then the bias to use is 2 - (1.5 * 2) = -1%, or 1% to Bush.

Site History

My academic specialties are biophysics and neuroscience. In these fields I make heavy use of probability and statistics in analyzing complex experimental data, and have published many papers using these approaches. Polling provides an interesting everyday example.

I originally did this calculation to help think about how to allocate my campaign contributions. I believe that one can make the biggest difference by donating at the margin, where probabilities for success are 20-80%. To read a discussion click here. This site now gets over 10,000 hits per day.

In addition to Kerry's race, the Senate is within reach, My recommendations for Democratic donations are listed on my ActBlue page.

For those of you wanting to reinforce the national election (to see why, go to the bias analysis above), I recommend the voter registration and turnout organization America Coming Together. For the optimists there is also the DCCC.

To point out the obvious, the converse interpretation of these calculations for Republicans is to direct resources toward the White House and the Senate.

Thanks to Drew Thaler for the site facelift and for the interactive map!

Other election resources

Data sources: Race 2004, electoral-vote.com, and RealClearPolitics. [database]
People who apply probabilistic methods to polls: Matthew Hubbard, Larry Allen, Andrea Moro.
DailyKos Senate analysis
Charlie Cook's National Journal analyses of the presidential race (older column) and Senate
OurCongress.org (House)
Bush approval rating. Less than 46 means he is toast, more than 53 means Kerry is toast; in between is uncharted territory. [Pollkatz Graph]
National horserace numbers
Ed Fitzgerald's survey of other electoral vote analyses
Analysis by William Saletan at Slate on current polls
Running commentary by Ruy Texeira on reading polls
Comparison of polls with ultimate election outcomes - suggests that the published margin of error is too optimistic by a factor of 1.4