From Prof. Sam Wang of Princeton University.
This page is available online at http://synapse.princeton.edu/~sam/pollcalc.html
Below is a meta-analysis directed at the question of who would win the Electoral College in 2004. Meta-analysis provides more objectivity and precision than looking at one or a few polls, and in the case of election prediction gives a more accurate current snapshot. In this case the median decided-voter calculation captured the exact final outcome. Calculations were based on all available state polls in the week before the election, which were used to estimate the probability of a Bush/Kerry win, state by state. These were then used to go through all possible combinations of battleground state results. A speculative calculation based on shifts from polls was incorrect (see the full methods). What failed? See this exchange. (site meter) (October site meter) (November site meter) I do not take donations, but readers have given to ActBlue and the NRSC.
Median state poll outcome, decided voters: Kerry 252 EV, Bush 286 EV (±39 EV MoE) (probability map)
Popular Meta-Margin among decided voters (explanation): Bush leads Kerry by 0.8%
State poll median with speculative undecided assignment: Kerry 283 EV, Bush 255 EV (probability map)
Electoral prediction with turnout: Kerry 311 EV, Bush 227 EV (probability map)
[Election Night brochure (PDF)] [Overview of calculation]
Real electoral result: Kerry 252 EV, Bush 286 EV. (outcome map) (comparison)
Thursday, December 2, 11:30AM: Several interesting and good articles about exit polls. Here is a piece in the Washington Post about demystification of exit polls. Also, in this week's New Yorker is a great article by Louis Menand on a meeting that took place at Stanford at which pollsters discussed the interpretation of this year's results. Essentially, they think the values talk is a misinterpretation of a bad question. The article argues that it was essentially terrorism (i.e. 9/11) that swung it for Bush. The article is print only - go get it. The December 6 issue.
Thursday, December 2, 11:30AM: Election results can be displayed in many ways. The cartograms used on this site are a way of displaying electoral votes accurately. To see displays done by population or on a county-by-county basis, see these interesting maps.
Sunday, November 28, 6:00AM: I am moving my post-election comments on Florida here. This is partly to maintain order and partly because I am starting to wonder whether there is a story here. After chewing over the Berkeley group's analysis and corresponding with some of you, I have thought of reasons why there is no real county-level anomaly. The essential problem is that the largest e-voting counties have large populations and have no counterpart for comparison. I believe that this problem is insurmountable by any statistical analysis, no matter how sophisticated.
Tuesday, November 16, 9:00PM: Some notes on future improvements to the calculation. David Kline points out that to calculate the single-state probabilities, for small (<30) numbers of polls, a t-distribution is more appropriate than a normal distribution. This distribution has longer tails and will give less certainty in the estimates. Going in the converse direction, in retrospect some states were more certain than the one-week snapshot indicated, and outlier polls tended to introduce occasional inaccuracies. A more sophisticated averaging procedure is needed, one that uses more polls but gives more weight to recent ones.
Tuesday, November 16, 11:00AM: Footage of my appearance on Fox News: [2 MB version] [30 MB] [200 MB]
Monday, November 8, 10:00PM: This is really interesting: a rundown of exit poll breakdowns by gender, education, issue, and so on. It seems to refute ideas that have been going around about the importance of religion and rural voters. The upshot seems to be that the biggest positive for Bush was terrorism (as opposed to Iraq, which was a big negative). Religion didn't matter any more than in 2000, nor did rural voters. Terror, terror, terror. Combined with the Mellman article cited below I surmise that Bush's job approval rating was boosted by perceptions about his ability to counter terrorism.
Saturday, November 6, 9:00AM: Here is a pre-election article by Kerry consultant Mark Mellman that predicted Bush's popular vote share to within 0.1%. Going by the article, the factors that went into the calculation included job approval, the economy, war, and right track/wrong track sentiment. Putting aside the talk about values (based on a single poorly-worded exit poll question), what about the relatively simple hypothesis that the economy's not that bad combined with loyalty to a wartime president?
Despite all this, the true margin of victory was about 150,000 votes in Ohio, and even smaller margins in Wisconsin and New Hampshire. As I said during the campaign, Electoral College mechanisms (essentially because of increased clustering of Bush supporters) gave Kerry an approximately 2% advantage compared with the popular vote. Thus the difference between the Meta-margin above and the popular vote margin.
Friday, November 5, 6:00PM: I continue to be deluged by email on the subject of the anomalies in Florida voting in small counties. As I said below, these data sort fairly well by the rural/urban divide. In a graph by Jeff Chambers that there may be some small remaining anomaly. However, this could be ballot spoilage. Here's an analysis demonstrating the size of the anomaly. I don't think this is going anywhere. The most constructive thing at this point is to redirect energies to voting reforms, such as those advocated by the Open Voting Consortium. This has the advantage of serving all Americans.
In the coming days I will revisit polling data to see what the turning points were in the election, as measured through the Electoral College. I suspect that some of the shifts in my data that I could not explain may be explained by campaign moves that were not obvious at the level of national media. The electoral vote calculation is low-noise and captures swings well, so this is a perfect use for it.
Friday, November 5, 10:30AM: Commentary on the electoral divide by the American Prospect's Harold Meyerson.
Thursday, November 4, 9:00PM: A number of you have mailed in a plausible explanation for the seemingly anomalous results in optical-scan voting counties in Florida. Essentially, the apparent disproportionate voting for Bush occurred in less populated counties. These counties, being non-urban and having fewer resources, are still using optical-scan ballots. Therefore three variables are correlated here: smaller populations, voting technology, and Democratic crossover for Bush. As a result there is a correlation between the last two variables, but the real variable of interest is population. This creates an ambiguity of interpretation, and is an example of the dictum that correlation is not causation.
The reason population is of interest is that it is a stand-in for rural vs. urban dwellers, as Andy Royle and others point out. Royle made a graph of population size and Bush victory margin, graphed by county. This graph suggests to me that in rural counties, many people are registered Democrats but cross over to vote for Republicans. This could be because Democratic party allegiance is a holdover from times when rural voters were more likely to be Democrats. Many new registrants in these counties are Republicans, which supports the idea.
This speaks to the idea going around that heartland and rural voters are turning to Republicans on "values" issues. In my view this conflicts with a natural link between them and the stated policy goals of the Democratic Party. It suggests a disconnect between stated Democratic values and how the party is perceived. For an interesting exposition I recommend What's The Matter With Kansas? by Thomas Frank, which addresses the dominance of the GOP in the heartland.
Thursday, November 4, 12:05AM: The meta-analysis has helped me stay rational in this election cycle, but some thought needs to go into future action. The election was close, but in a binary event 0.51 rounds up to 1. Anyway, here are some cogent post-election thoughts from Josh Marshall.
Wednesday, November 3, 7:45PM: A few items of business. First, note that without my optimistic assumptions, I nailed the Electoral College tally. This, even in the face of single-state probabilities that give a different map and EV total. Poll margins are also quite close: see these validations of the method (still under modification but viewable). Now there's a testament to meta-analysis (and an indictment of the mindless mainstream horserace coverage of polls this season). Note, however, the large margin of error on my decided-voter estimate. A one-percent swing in Ohio or Wisconsin would have changed the total EV count - in the case of Ohio, to great effect.
This site. I am considering what to do. It is highly disorganized because I started off HTML-coding by hand. If readership continues, I may transfer to regular blogging software. However, this requires time or help. Another question is what topic(s) to cover. Let's first see if traffic continues. Your letters of support are heartening. I can't reply to all, but please continue to write. If your comments are for general consumption, here is a public thread.
Exit polls. Much of my mail today concerns exit polls, with calls for analysis to check for widespread fraud. Think about the lessons you have learned here. A more plausible possibility is that exit polls themselves are biased, for instance by the identity of the questioner or the temperament of the respondent.See my analysis of exit polls.
The incumbent rule. After some thought, I realize that we simply don't know what factors drove the result yesterday. The problem is that all the factors sum to give final voting, and are therefore hard to distinguish. In the New Republic is a suggestion that turnout was either symmetric or went against my assumption.
Fraud in Florida? This is an interesting question, and my evidence suggests that further investigation is warranted. Of all the battleground states, Florida was one of the most surprising in terms of deviating from the outcome expected from polls (and to a lesser extent, exit polls). The deviation favored Bush. It's too bad about this year's exit poll problems, because they would have provided an independent test. My analysis of this is in the validation of 2004 results.
Wednesday, November 3, 3:15PM: Dear readers, I am getting lots of mail on my brilliant prediction. Thanks to all of you on both sides of the aisle. For many opponents of Bush, this election has been a wrenching event, not unlike a death in the family. This election was a highly significant political event that we will be feeling for decades. I will post some of your mail later. For now, I post a thought-provoking note from someone opposed to my views, along with my reply, here. It has the tonic effect of returning this site to its original purpose, hardcore quantitative analysis. With that, I must now attend to my day job...
Wednesday, November 3, 11:30AM: To summarize points so far this morning: So far, the electoral outcome matches pre-election polling data very closely, with the possible exception of Florida. Therefore the electoral count looks a lot like the decided-voters median listed above. However, my final predictions were wrong. It appears that my add-on assumption about undecideds (the rule that they break against the incumbent) was wrong; this may have been because of the war, as suggested by the Mystery Pollster. My turnout assumption was also wrong. Exit polls do match my projection, which is surprising to me. This could be because those data are somehow non-representative, for instance because of gender bias.
Specific comparisons: victory margins were predicted well by polls. Out of 23 battlegrounds, the direction of the outcome was predicted in 22. The exception was Wisconsin, where the polling margin was 0.4% for Bush and the actual margin was about 0.4% for Kerry. Quantitatively, 12 victory margins were within one standard error and 17 were within the 95% confidence interval. Not perfect, but not bad.
Wednesday, November 3, 10:30AM: In Ohio, many provisional ballots are left to be counted. In the meantime, here is some general analysis. Overall, pre-election polls, exit polls and actual voting are mostly correlated. An exception occurs in Florida, suggesting that something unusual might have happened there, either in voting or in exit and opinion polling. The effect is probably smaller than Bush's margin.
Voting margins track pre-election polls - with exceptions. Voting margins were more favorable to Bush by 0.9 ± 0.6% (median ± SEM; SD, 3.1%) than pre-election polls. This is very near to no difference at all. All discrepancies greater than 5 percentage points (AR, HI, NC, WV) occurred in states with few recent polls. The next largest discrepancy was in Florida, 3.6% towards Bush. Since Florida had so many polls, this is 4 SEM away from zero. Otherwise the match is quite good. Overall, 12 out of 23 pre-election estimates were within 1 SEM of the voting outcome, less than the expected 16 but not bad. I conclude that in most cases this year, five or more likely-voter polls taken in the week before the election gave an estimate that strongly correlated with final voting.
Voting margins and exit polls differ systematically. Exit polls were more favorable to Kerry by 3.0 ± 1.5% (median ± SD) than real voting. This is tentative since I do not have the most complete exit polling data. Currently, in FL I have an exit poll margin of Kerry +1 and a real voting margin of Bush +5%; the discrepancy, 6 points, is again somewhat extreme. This is consistent with the discrepancy noted above. In OH I have an exit poll margin of Kerry +1 and a real voting margin of Bush +2; the discrepancy, 3 points, is right in the middle of the range. I don't know why exit polls would differ systematically, though one obvious possibility is the gender gap in respondents.
Graphs and further analysis will follow shortly.
Wednesday, November 3, 8:00AM: Good morning. First, the basic points: the polling data were fine, but my follow-up assumptions about net undecideds and turnout were wrong. The statistical approach is fine, and indeed still the best way to look at the data. The final result will be very near, and perhaps exactly equal to the decided-voters calculation. Finally, now that votes are mostly in, we have real results to compare with pre-election polls. I will be doing that in the coming days. This is the point at which something real can finally be learned.
Wednesday, November 3, 2:15AM: I was very wrong about FL and about the overall offset. The outcome is somewhere between the decideds-only and the with-undecideds medians. However, for months I have been saying that it was about Ohio, and it is. The margin there seems headed for about 1%, without absentee or provisional ballots. Another squeaker - though things are not entirely over yet.
Wednesday, November 3, 12:45AM: Possibly K259, B254 (assuming MN, IA, HI, MI). Then Ohio (20EV) - returns here. May come to Cuyahoga County (includes Cleveland) and absentees. Also NV (5EV), but at this point that doesn't matter.
Wednesday, November 3, 12:15AM: Consistent with the gender imbalance, exit polls seem to be biased toward Kerry relative to final totals. Not crunching numbers yet but will try later.
Tuesday, November 2, 10:00PM: Zogby's projections, which resemble mine, are here. Here is a plot of afternoon exit polls against my last poll margins. As you can see, most of the data points fall around Kerry +3%, the assumption that went into my final projection. However, the gender breakdown is 59F-41M, suggesting a biased sample, and thus indicating a problem with my assumption. However, as you all know, much depends on FL and OH. The partial returns for FL (77% of precincts reporting) indicate Bush - does anyone know if these include early voting, absentee and overseas? Overseas and absentee may not be counted until Thursday. Early votes - probably counted.
Tuesday, November 2, 5:45PM: There are so many of these exit polls - just like the campaign season all over again. Time for meta-analysis. I am plotting all that I can find on my graph. Nearly all of them are above the no-bias line. Looking at all of them at once, the median bias seems to be about +3%. This suggests to me that if overall, these polls reflect total voting, OH and FL will end up for Kerry, with margins of 2% by night's end. (Famous last words...)
Tuesday, November 2, 3:00PM: Early exit polls on Drudge (above the title). If you plot them on my brochure graph, their median is around +5% bias. This may regress as Republicans get to the polls, but I think this is a telling sign. Not sure if I will be blogging the returns - just a heads-up for you. I wonder if my prediction was too cautious! That's what I get for paying too much attention to my email.
Tuesday, November 2, 2:30PM: Running list of brochure errata and edits here. All are fixed in the current version, downloadable here (PDF). Apologies for these small errors. Page 1: where it says "215 EV" it should say "230 EV." Page 4: NM margin is Bush +2.7 ± 1.7%, not +3.1 ± 1.5%. I now have a Kerry win probability >50%. Page 4: My predictions are added in parentheses. Page 6: The bias calculator is updated to extend to a larger range to the right. Old one still OK - but you may need this.
Tuesday, November 2, 11:30AM: Poll closing times are listed here. The earliest bellwether is NH (poll average Kerry +2.9 ± 1.1%, n=7). Kerry by 5-7% would be consistent with my final assumptions. Next is WV. Let's see what happens.
Tuesday, November 2, 10:45AM: I have added two pages to the brochure - a graph to help you calculate for yourself the net bias, and an electoral map.
Political psychologist Manuel M. Frank writes in with a model indicating that I have underestimated the turnout bias. Looking at his work, my original figure of +2.5% agrees with his calculations based on Annenberg interest-level surveys. His work can be read here; he also has additional comments.
Monday, November 1, 11:00PM: The last prediction is in. I have set the turnout figure to a lower bound of 1.5%, for a net of +3.0%. This is a value for which the median and map are readily reconcilable, which may reduce my email load. Readers, thanks for the dialogue. The box and final summary are updated. Other entries are bumped down. And here is your Election Night brochure (PDF).
Finally, a note to all of you. As this site has grown since July, you have seen me try new assumptions on the fly. Most have been improvements, but at times I do something unusual, at which time I hear from many of you. This feedback, although invisible to most of you, has been extremely valuable. In a sense, this site has many open-source qualities. In the event that the prediction is accurate, it was the most amazing of phenomena, an Internet-based collaboration. Thanks for your ideas, your criticisms, your praise, your strange email, and most of all your readership (nearly 80,000 visitors today - site meter).
The last polls for the 2004 calculation come from Granite State (NH), Ohio Poll (OH), Opinion Dynamics (R) (FL, IA, OH, WI), Rasmussen (FL, MI, MN, OH, OR, PA, VA, WA), Strategic Vision (R) (FL, NJ, OH, PA), Survey USA (AR, MO, NC, PA), and Zogby (AR, MO, NH, OR, TN, WA).
Monday, November 1, 6:30PM: I've been getting criticisms of my original estimates of the undecided and turnout adjustments. Why don't we postpone further discussion until tomorrow, when we know the true outcome. Anyway, I have given you enough information to let you make your own predictions.
Lower and upper bounds can be set using Gallup's last poll. This shows Kerry +2 among RVs and Bush +2 among LVs. The difference betweem these sets an upper bound of +4%. To set a lower bound: where data are available, new registrations give Democrats an approximately 0.5% edge in swing states; these new registrants are first-time voters who may fail LV screens. Also, the election will be high-turnout, suggesting that at least some of the 4% gap in Gallup will be filled. For example, a turnout increase from 50% to 62% would lead to +1.0%. Using this as a very rough estimate (and using it to subsume other factors as well), these two factors sum to +1.5%, a lower bound.
Finally: new registrants and overseas voters skew anti-Bush by a large margin, and 527s are pouring massive amounts of money into GOTV activities. I do not know how large these effects will be, except that they are likely to be towards Kerry. The suggested cell phone problem in polling is probably irrelevant (more on this later). I also do not know to what extent the 4 million evangelicals or Rove's 72-hour operation will materialize for Bush. Known unknowns...
Monday, November 1, 12:00PM noon (updated 11:00PM): Here are my final calculations and predictions. Extended discussion and supporting links can be found in the November 1, 8:00AM update. I have a brochure for your use to follow the returns on Tuesday night and test my assumptions (and yours). Here it is.
The map below shows Kerry and Bush's win probabilities, individually by state. The closer a state is to a tossup, the closer it will be to white. The map uses undecided and turnout assumptions; maps that do not make these assumptions can be seen by clicking in the box above. Note that unlike the map, the median projection takes compound events into account. Thus the map EV total is often not the same as the median EV total. An explanation is here. Click on the map for an interactive pop-up (thanks to Drew Thaler). For difficult browsers here is a static map and Ravi's simpler calculator.
Overview: The basic calculation derives from polls only. Using statistical methods of meta-analysis, I use polls to calculate a starting point, referred to as "decided voters only." This result is an uncorrected snapshot of where the polls stand. In addition to this, I estimate the effects of last-minute undecided/uncommitted voter decisions and differential turnout. My estimates are supported by evidence, but are by no means certain. Results based on uncertain assumptions are clearly labeled. To let you try your own assumptions, a table of medians with different bias values is given here.
The basic decided voter result. The median of Kerry 252 EV, Bush 286 EV among decided voters was calculated from 168 polls taken in 23 battleground states, and stepping probabilistically through all possible outcomes. Most of these polls were completed between October 25 and November 1. The EV estimate carries a large amount of uncertainty: the 95% confidence interval is ±39 EV. Thus, if only decided voters counted, the nominal Kerry win probability would be 20%, or 4-1 in favor of Bush.
Decided voters only (% Kerry win probability): AR 6, AZ 1, CO 1, FL 24, HI 67, IA 32, ME 100, MI 91, MN 82, MO 1, NC 0, NJ 99, NV 9, NH 93, NM 9, OH 30, OR 99, PA 85, TN 0, VA 0, WA 100, WV 0, WI 43.
Rank order of states: States currently in play in the 20-80% probability range, indicating a near-tie, are in bold. Turnout and how the undecideds break will shift which states are at a near-tie, but the order, from most Democratic to most Republican, should stay about the same.
Decided voters only: Democratic <- ME/WA/NJ/OR(95-100%) / NH / MI / PA / MN / HI / WI / IA / OH / FL / NV / NM / AR / CO/MO/AZ/TN/VA/WV/NC(0-5%) -> Republican
With undecided voters assigned. Undecided voters typically end up voting against the incumbent. In previous presidential races this has given a 2.5 ± 2.0% advantage to the challenger. I currently estimate that 3.0% of voters are undecided. A 3-1 Kerry-Bush split gives a +1.5% net advantage to Kerry. This leads to a median EV estimate: counting undecided voters, Kerry 283 EV, Bush 255 EV, and a nominal Kerry win probability of 73%, or 3-1 in Kerry's favor.
Turnout estimates and other corrections. The principal factor not measured by polls is turnout. Pollsters ask respondents questions to determine if they are likely to vote. However, this cannot capture efforts by voter turnout organizations. In addition, newly registered voters have no track record. Finally, telephone polls may not accurately sample the voting population. I estimate that these factors sum to an advantage in battleground states of 2 to 3% for Kerry. As I am sure you are all aware, this number cannot be known with certainty. With that caveat in mind, I use +1.5% as my turnout figure (see discussion at top of page). Combined with the +1.5% undecided allocation this makes a +3.0% bias as plugged into the MATLAB script. This leads to my final prediction. Predicted electoral outcome (11/1/2004 11:00pm EST): Kerry 311 EV, Bush 227 EV, nominal Kerry win probability 98%.
Note that all of these probabilites are conditional on the turnout and undecided voter assumptions being correct. The true probability is obtained by multiplying by a measure that is a function of whether my assumptions are accurate. The chance that I am wrong makes the true probability substantially lower than 100%! As Niels Bohr (and Yogi Berra) said, "Prediction is hard, especially of the future." Just for the record, my gut estimate of the likelihood of a Kerry win is about 6-1 in favor.
Another way to look at it (suggested by Abe Fisher) is to turn the undecided+turnout question around: in order to get the Kerry win probability above 90%, the sum of these two factors must be 2.2%. Note that to get to exact even odds, the needed amount is the Meta-Margin, listed today as 0.8%.
Back to predictions: Based on the probabilites below, of the 23 states modeled, Kerry's expectation value of states is approximately 14-15 of them, for a total of 23-24 states plus the District of Columbia.
Prediction, undecideds assigned, plus turnout (% Kerry win probability): AR 47, AZ 5, CO 20, FL 79, HI 97, IA 85, ME 100, MI 100, MN 99, MO 19, NC 0, NJ 100, NV 55, NH 100, NM 55, OH 83, OR 100, PA 99, TN 3, VA 7, WA 100, WV 5, WI 91.
The popular vote. To estimate the popular vote I use two approaches: (a) one based on presidential preference polls and (b) one based on Bush's job approval numbers. In 16 national polls the medians (± SEM) are Bush 48.0 ± 0.4%, Kerry 47.0 ± 0.4%. Assuming 2.0% for Nader/other, the fraction of undeclared voters ("undecideds") is 3.0%. Assuming Cook's incumbent rule that undecideds split 3:1 for the challenger (2.25% and 0.75%), this gives a net of 1.5 ± 1.2% to Kerry. This predicts a national popular vote (not corrected for turnout) of Kerry 49.3 ± 0.9%, Bush 48.7 ± 0.9%, Nader/other 2%. The second measure uses job approval ratings. In ten polls taken since mid-October the median ± SEM is 49.0 ± 0.9%. Based on historical trends, this places an upper bound on Bush's share of the popular vote. Thus, both approaches indicate that Bush's popular vote share will be 49% or less.
I use the turnout factor to make a final estimate. Predicted popular outcome: Kerry 50.0%, Bush 48.0%, Nader/other 2%. National polls come from davidwissing.com, RealClearPolitics, and yougov.com. Job approval numbers come from pollingreport.com.
An electoral tie. A 269-269 EV tie would throw the election into the House and Senate, which would most likely lead to the re-election of Bush and Cheney. However, this would be an emotionally divisive event. The probability of an electoral tie is: Decided voters only, 4.1% (24-to-1 against). With undecideds, 3.1% (31-to-1 against). Final prediction with turnout included, 0.4% (273-to-1 against).
The power of your vote (the jerseyvotes calculation). Previously I have discussed where you are most effective in your door-to-door activism. My unit is the jerseyvote, which is the power of a New Jersey voter to influence the national election. Among decided voters only, the current value of a single vote in the top states is (measured in jerseyvotes): Hawaii 11,900, Iowa 7,500, Wisconsin 7,400, Florida 7,200, Nevada 7,100, New Mexico 5,800, Ohio 5,600. Counting undecideds, the top states still appear and Arkansas joins the list. Other values of relevance are (decided voters) New Hampshire 2,200 and Pennsylvania 2,500. As you can see, a jerseyvote's value to American politics is what the Reichsmark's was to the Weimar German economy.
Key states. These statistics summarize polls completed between October 25 and 31.
In Florida (14 polls), Bush leads in 8 polls, Kerry leads in 4 polls, and two polls are tied. Bush's average (± SEM) margin is 1.4 ± 0.9% in polls. I predict a Kerry victory by 2%. Polls close at 6:00-8:00PM Eastern time.
In Ohio (17 polls), Bush leads in 12 polls, Kerry leads in 4 polls, and one poll is tied. Bush's average (± SEM) margin is 1.1 ± 0.7% in polls. I predict a Kerry victory by 2%. Polls close at 7:30PM Eastern time.
In Pennsylvania (14 polls), Kerry leads by 2.1 ± 0.7% in polls. I predict a Kerry victory by 5%. Polls close at 8:00PM Eastern time. General poll closing times are diagrammed here.
Bias analysis: The potential effects of differential turnout, splitting undecided voters, or systematic polling bias are as follows. The baseline from which bias is defined is decided voters only. Decisions by undecided voters and get-out-the-vote activities on Election Day will be major determinants of how large this bias effect is.
4 points towards Kerry: Kerry 325 EV, Bush 213 EV, Kerry win 99.9%.
3 points towards Kerry: Kerry 311 EV, Bush 227 EV, Kerry win 98%.
2 points towards Kerry: Kerry 291 EV, Bush 247 EV, Kerry win 86%.
1 points towards Kerry: Kerry 273 EV, Bush 265 EV, Kerry win 56%.
no swing (decideds only, flat turnout): Kerry 252 EV, Bush 286 EV, Kerry win 20%.
1 points towards Bush: Kerry 238 EV, Bush 300 EV, Bush win 97%.
2 points towards Bush: Kerry 217 EV, Bush 321 EV, Bush win 99.8%.
3 points towards Bush: Kerry 203 EV, Bush 335 EV, Bush win 99.99%.
4 points towards Bush: Kerry 186 EV, Bush 352 EV, Bush win 100%.
Thursday, November 4, 9:30AM: Jim G sends in his first look at county-level analysis of Florida and suggests that nothing is amiss. However, a full listing of results by voting method raises questions not about electronic voting, but about optical scan voting. I am open to any explanation for these numbers - resolution soon, I hope.
Monday, November 1, 8:45PM: For all of you writing me about the cell phone idea, I do not think this objection has merit. I have read the recent Zogby poll. I will write about this after the election.
Monday, November 1, 7:15PM: Upon reflection the last bit involves double-counting of turnout. Perhaps this factor should be less than 2.5%. This will be reflected when I incorporate the last polls.
Monday, November 1, 4:45PM: Gambling is a vice. But putting that aside, looking at TradeSports I think that a good cautious bet is to take equal positions against "Bush to win the election" (i.e. SELL, a bet that he loses) and against "Bush to win 250 or more EV" (i.e. SELL, a bet that he gets <250 EV). One way to do this quickly is to click on 'Live Help' at their site and telling the support representative you want to do a manual credit card charge. Other sites to try are here and here.
Monday, November 1, 9:00AM: Today I was interviewed live by Wall Street Journal This Morning. The program was carried on over 80 stations across the U.S. and on the Sirius and XM satellite networks. Listen to it here (MP3, 1.3 MB).
Saturday, October 30, 5:30PM: That was a fascinating experience. So much was left out but I did get to say my piece. After we cut, the FOX guy asked me what I really thought would happen. I said I thought Kerry would get over 300 electoral votes. Of course this happened afterwards. Typical television - very compressed. Real analysis soon - sorry about lack of substance on this post.
Thursday, October 28, 8:15PM: More letters.
Thursday, October 28, 3:00PM: Edward Witten writes, "On mydd.com, I read yesterday a rumor that a NYT poll of Florida showing Kerry ahead by +9 percent was buried as being implausible. I don't know if the rumor is true, and if it is I am sure the poll was flawed, as Kerry is surely not leading Florida by that amount. But to me it illustrates the fragility of trying to predict the election from the available state polls. Including or excluding a single, undoubtedly flawed, poll showing a +9 percent lead in Florida for Kerry (or Bush) would probably have a significant impact on your overall assessment of the outcome of the election." To some extent he has a point. If such a poll exists then the decided-voters moves K up to K 271 B 267, win probability 51%. The use of median rather than mean circumvents this problem a bit, which I will do soon. In any event, the confidence interval is +/-36 EV. Therefore neither result is statistically significant. Any way you slice it, the election is a toss-up and will depend on turnout and undecideds.
Thursday, October 28, 9:45AM: Since posting a few letters here I have received many more. Here are some selected from yesterday's mail, October 27. Of special interest is one from Jim G. from New Hampshire, an undecided voter. He is articulate and thoughtful, and though we don't agree on a few things I strongly recommend his letter to all of you.
Evidently I spoke too soon regarding Bush giving up on Ohio. His travel plans include three stops there until Election Day. He is also pushing in Michigan, where Kerry has recently slipped a bit.
Wednesday, October 27, 5:30PM: A brief note on the vagaries of opinion polling. When we read polls we often make the implicit assumption that people report what is really going on inside their heads. However, this is a subjective report. The famous example in these closing days is the "undecided voter." But are these people undecided in the sense that we mean colloquially? Are they one monolithic category of person?
It's been pointed out that many undecided voters are unfavorable about the incumbent, and usually break for the challenger. This phenomenon may simply reflect the fact that some people are unable or unwilling to state a set preference. To cite a homely example, you may find yourself unable to articulate what you want for dinner, but you can react immediately to what you don't want.
Recently Scott Rasmussen reported data that he says supports the notion that late-deciding voters prefer Bush. The survey was done from 136 late-deciding voters, far too few to reach statistical significance. This is a message poll aimed at driving the discussion in his preferred direction. Also, the survey assumes that the voters who decided during the survey period are similar in characteristics to those who wait until the last minute, possibly until they are standing in the voting booth. This is untested.
A parting thought on undecided voters: we are not going to resolve this by further argument! The best we can do is come up with a way to measure what they do, and wait until after the election. I will try to provide this as part of my final Election Night briefing document.
Other examples of respondent inaccuracy are the party-ID question, which can depend on when in the survey it is asked (especially if asked after the presidential preference!) and the question of who people voted for in the last election (on average, people show a tendency to think that they voted for the winner even if they did not).
Finally, once again: the probability map is not the same as the median calculation. This is why they do not match. If you were thinking about writing me to point this out, read this first.
Wednesday, October 27, 12:30PM: My regular email address works once again. Send your correspondence there.
Charles Cullen writes asking today's probability of a 269-269 electoral tie. Using decided voters only it's 3.9% - a lot! With the undecideds assumption it's 0.4%. In this scenario the newly elected House and Senate would determine the president and vice-president, leading to Bush-Cheney (if the Senate remains Republican) or Bush-Edwards (if the Senate goes Democratic).
Wednesday, October 27, 7:00AM: One of the pleasures of running a popular Web site is the correspondence. Click here for some selected letters from the last few days of various types - illuminating, entertaining, and unintentionally hilarious.
Wednesday, October 27, 5:30AM: Hawaii has been added because of two recent polls showing possible leads for Bush. This seems very unlikely. In any event, what is really needed is a third poll.
With that, let's think about a favorite subject of mine: why individual polls seem surprising or contradictory. I can think of three reasons:
1. Reporters often don't understand statistics. A poll showing Bush up by 5% and another showing Kerry up by 1% are in fact consistent with one another because of random sampling error. For more on this read yesterday's entry by Mystery Pollster (Mark Blumenthal). A better way to get a good answer is to examine many polls at once. For the record, Charles Forelle at the WSJ is a very notable exception - in his article about this Web site and others, he captured the subject perfectly!
2. Man bites dog. When a poll's finding sounds interesting, it gets more attention than a boring result. Therefore reports of outliers tend to grab headlines, creating apparent discrepancies.
3. Competition among organizations. News organizations usually rely on their own data alone. If they do this, they cannot achieve the increased accuracy that comes from comparing multiple polls. Indeed, little incentive exists to improve accuracy, since low accuracy leads to more frequent news stories, and therefore more readers or viewers.
Tuesday, October 26, 11:00AM: Statistically based analysis of the Electoral College is featured in today's Wall Street Journal. Welcome to new readers!
The overall raw polling outcome (decided voters only) is still a statistical tie. This is true even with more than 100 polls used in today's calculation. Bush has tiny leads among decided voters in Florida and Ohio, indicating that the outcome in these key states will be determined by undecided voters and turnout.
Finally, a note on national polls. In 8 national polls (two-way choice) the median result is Bush 49, Kerry 47. Assuming that methods are similar, the fact that this margin is larger than the Meta-Margin above supports the idea the distribution of support for the candidates gives Kerry a small advantage.
Saturday, October 23,10:00AM: I am working on a reference sheet to give you late next week. In addition to bottom-line predictions, this reference will give you a list of things to watch for on Election Night, along with key combinations that Kerry and Bush need. The content will change a bit in the coming week as the last polls come in. However, some outlines are now coming into view.
Under today's polling conditions, four states are clearly in serious contention: Florida, Iowa, Ohio, and Wisconsin To a lesser extent so are NV, WV, and some others. Depending on undecideds/turnout/bias, states come into or go out of play, but in those situations Kerry or Bush typically win the Electoral College by a more comfortable margin. So let's concentrate on this near-tie condition. After assigning other states as indicated by polls (PA and MI to Kerry, MO to Bush, and so on) and playing with combinations, several patterns emerge.
First - if Kerry wins Florida, the election is over - he wins. Kerry can also win by taking Ohio plus one of the smaller states. In the other direction, Bush must win not only Florida, but also either Ohio or all of the smaller states. In light of these facts, the Saletan piece (see below) indicates that the Bush campaign's actions may amount to a defensive move - otherwise why give up on Ohio?
It's also possible to identify states that look moderately solid, but might flip if the combination of undecided/turnout/bias factors adds up. This is interesting because this shift is likely to be similar across states. Therefore these states can act as an early-warning system for a surprising election night. For instance, Arkansas, North Carolina and Virginia currently look like Bush states, and Maine looks like a Kerry state. If a surprise occurs in any of these states, this might presage a significant offset between decided-voter polls and the real outcome.
Wednesday, October 20, 4:00PM: I've received lots of feedback on undecided voter assignment, much of it constructive. This has led me to rearrange the way the results are presented.
First: the calculation is now set to its old definition from two days ago. Many of you are very familiar with the raw (decided voters only) calculation by now. Switching back was suggested by many readers of various political persuasions. Whether people liked the direction of the outcome or not, many were uncomfortable with the mixing of current numbers and previous election outcomes. Also, this site has many calculations that are based on decided voters only, and it only adds confusion to redo those.
Second: the assignment of undecided voters is now done probabilistically, like the rest of the calculation. Past elections from 1956 to 1996 show a wide range of undecided breaks for the challenger: [+3 +6 +2 +1 +6 +0 +2 +4], median 2.5%, estimated SD 2.0% (analysis). This year may be unusual, though note in 1972 a break of 2% away from Nixon, at the height of the Vietnam conflict and after the invasion of Cambodia. Anyway, because the contribution is variable, the undecideds-assigned calculation (MATLAB script) takes this variation into account. The results are listed in the box above. This is my own current prediction. Also, the state probabilities are now given both with decideds only and with undecideds added. Thanks to Alan Cobo-Lewis and Rachel Findley for key discussions.
Third: There are now two maps (see box). The static image below is set with undecideds assigned.
Now, to the interesting bits. Look at the state probabilities. Because the undecideds could break evenly or for the challenger, many states are still toss-ups, including Florida and Ohio. The lingering uncertainty reinforces the idea that the election is close enough to be determined by turnout. Even if the undecideds break evenly, a 2% difference in turnout could change the result drastically, which you can see in one direction by comparing the maps. Have I mentioned before that I think turnout is important? Turnout is very, very important.
Whew, that was tiring. I think I need a wee dram!
Tuesday, October 19, noon: Today I implement the first major change to this calculation - I am allocating undecided voters. To do this I use past presidential election voting patterns, specifically the incumbent rule as described by Charlie Cook of the National Journal. This gives a more accurate snapshot and is a step toward making an actual prediction.
Rationale: It is known to poll analysts that voters who are undecided usually end up voting against the incumbent. In particular, compared with their final poll numbers, incumbents get between 2% less and 1% more. In contrast, challengers do better on average by 3%. These figures are consistent with Cook's estimate that undecideds split at least 75% for the challenger. In today's summary of national polls, the average Bush-Kerry split is 48.5-45.5, which sums to 94%. Assuming 2% for Nader and other candidates, the remaining undecideds are 4%. Splitting these by Cook's rule gives 1% to Bush and 3% to Kerry, reducing the margin by 2%.
Therefore, for the main calculation I will assume that the undecided-voter shift is +2.0% towards Kerry, shift state polls by this amount (using the variable already provided in the script), and proceed with the calculation. Based on state polls in Florida, Ohio and Pennsylvania, I estimate that the proportion of undecided voters in these states is similar to the national figures. Because national polls come more frequently, I will use them to calculate the shift. The size of this shift may change in the final days, and I will be monitoring this.
This new estimate is likely to be more accurate. However, it is also the first change to the calculation that is not neutral: it goes beyond the polling numbers themselves, and it is in a direction that is favorable to my candidate. For example, Florida, Iowa, Ohio and Wisconsin are still toss-ups, but they are now above the 50% probability threshold for Kerry. Therefore I will continue to report the results without this adjustment. This is listed in the box above on the line labeled "Decided voters only." The corresponding Meta-Margin can be calculated by subtracting 2.0% from the value listed.
To read more about the incumbent rule, see Charlie Cook, Guy Molyneux, the Los Angeles Times, Mark Shields, the Mystery Pollster, and a contrarian.
I have also simplified the box by removing the line about the Colorado ballot initiative, which, based on a recent poll and Salazar's opposition, seems likely to fail.
Finally, I will continue listing rankings and probabilities for all states. I have decided that there is little benefit to leaving these out.
Hitting the streets: How much do you affect the election by getting out the vote? Also, where are your efforts most valuable? To help guide your efforts, here is a synthesis of previous posts. Once you decide, I recommend contacting your local Democratic (or Republican) organization or America Coming Together.
This question can be answered by calculating how much the Electoral College win probability is changed by one person's vote. This affects where you should go because as an individual, you can only get out a finite number of votes. Today the best states to go to are Iowa, Ohio, Nevada, and Florida. Nevada, while small, is on the list because it is a near-tossup and relatively few voters per electoral vote. Here is a case study. If you are a New Jersey resident, your vote has some value, but it is low since the state is very likely to go Democratic by a substantial margin. In contrast, driving a voter to the polls in Pennsylvania is worth nearly 300 times as much. If you go to Ohio each vote is worth even more, over 500 "jerseyvotes." The top states are IA (686 jerseyvotes), OH (528), NV (508), FL (372), NM (304), WI (295), PA (295), MO (199), AR (151).
Although the calculation is unbiased, I am not. I am a Democrat. To see a list of races I consider critical, see my ActBlue page. My advice to all voters (including Republicans) is the same: Go to battleground states. Register voters. Make phone calls and knock on doors (a very effective strategy) to canvass for voters. Vote absentee or vote early (online resource), and on Election Day, work to get out the vote.
Current percentage probabilities of a Kerry win in each battleground state, computed from the last three polls or going back seven days, whichever gives more polls. States in boldface had a new poll completed and released since the last day of updates. The probabilities are calculated assuming that the SEM cannot go below 2%. Click on a state to view a tabulation of most of the polls. Some of the others come from these data sources (some are subscription-only). Other sources are electoral-vote.com and RealClearPolitics. All data are visible in this MATLAB script.
This is a history of the calculation. For this calculation each poll is assigned to the last date on which polling was done. The marked events are inspired by a similar graph by electoral-vote.com. In my graph, the effect of events is clearer because I use polling margins and because I average over three polls. Fahrenheit 9/11, adding John Edwards to the ticket, and the Democratic convention seemed to have measurable effects within a few days. The passing of Ronald Reagan and the assault on Kerry's war heroism did not. The last update was October 12. Note: Around the time of the first debate I started using more polls per state. This and the start of Rasmussen daily tracking polls has complicated updates. Therefore after September 25 the graph is simply a record of previous daily updates - not quite the same. For instance, in this graph the bounce from the first debate looks delayed. In fact, it was immediate. This graph will be done more properly soon.
Wednesday, November 10, 8:00AM: Regarding the ongoing Florida voting fraud controversy, S. Doershuk, who has expertise in demography, makes a constructive suggestion: "It might be instructive to examine some similar counties in Georgia and Alabama, particularly those which border on north FL, to see if the same pattern can be found. Given the 'bright red' nature of both border states, I wouldn't expect anyone to have bothered to tamper with vote totals there, so a comparison could be instructive." This is excellent. If any of you has this information I would be very interested.
Saturday, October 30, 2004, 11:00AM: Update comes later. Thanks to readers for pointing out the WV mistake (it should be solidly red). The calculation and maps above are updated to correct this error; the bias calculations have to wait until later. I must now go and get duded up for the FOX News thing (tune in, 2:45PM Eastern)...
Friday, October 29, 2004, 11:30PM: After due consideration, I need to stick with using the mean in order to be consistent with previous analyses. Thanks for your patience - that's what you get for a project that is essentially open-source - that is, I shoot first and take your feedback later. In retrospect the best approach might have been to use a filter weighted by time going back more than seven days. This will have to wait until 2008.
Pollster John Zogby was on The Daily Show last night. He said what I have been telling you for many weeks: a break of undecideds towards Kerry is likely to be enough to get him over the top. I did not watch the program, but I think Zogby did not mention turnout. Too bad - I remain convinced that turnout will be the decisive variable in this race.
The polling-related comments section is getting a bit out of hand. If anyone is aware of a chat room that focuses on this site, please let me know. I would love to restrict my comment thread to technical and other serious inquiries.
Friday, October 29, 2004, 10:00AM: Tune in to FOX News on Saturday (tomorrow) around 2:45PM, when I am scheduled to talk about the meta-analysis.
I am traveling today until evening, so the next update may be late in coming. If nothing shows up in the next four hours, check back late tonight. I also hope to have the briefing sheet I promised before, but other events have slowed me down a bit and I might not have a good version until Sunday. In the meantime, the fastest state poll updates can be found at race2004.net and The Hedgehog Report.
First, I apologize for the relative lack of updates and commentary! I face logistical hurdles, including my professional society's annual meeting (where I am), intermittent Internet access, and university mail system failures.
This site is mentioned in an article on polls in today's Newsday. However, there is one error - my margin of Bush over Kerry counts decided voters only, and does not include undecided voters.
I no longer list the overall probability of a Kerry win with undecideds allocated. This is because the uncertainty of how undecideds will break is accounted for state by state, but the compound probability calculation assumes independence among states, which is unlikely. The true probability is, roughly speaking, approximately equal to the probability that the undecided advantage (which I assume is 2.5 ± 2.0% for Kerry) and the Meta-margin (currently 0.5% for Bush) sum to a positive value for Kerry. Today this probability is around 75%. To take away to stat-speak, restated in English what I mean is that given the history of what undecided voters do, today I give Kerry 3-1 odds over Bush. The median EV count with undecideds assigned is still OK.
Regarding the Hawaii question, it's possible that this state is competitive but right now there are not enough data to say. Stay tuned.
A story in today's Washington Post confirms what I am suspecting: the Bush campaign is in trouble, and Bush-Cheney campaign insiders recognize this. It's consistent with the defensive move of pulling out of Ohio for a last stand in Florida.
By the way, I am having email troubles on the university server. To reach me cc your messages to mindgeek at gmail dot com.
At Slate, Will Saletan points out that based on Bush's travels, his campaign may consider Florida more of a must-win state than Ohio. Looking at today's decided-only numbers, this has merit. If Bush takes Ohio his win probability is 72%, but if he loses then this drops to 40% - not even a twofold difference. However, Florida is different. If Bush wins Florida, his win probability.is 88%; if he loses, it's only 20%. To show why this is, Saletan describes electoral scenarios involving smaller states (WI, IA, NV, NM) that Bush could cobble together to make up for the loss of Ohio.
Although I don't analyze national polls, I am asked about them frequently. For instance, how to interpret the latest Gallup poll reporting a Bush-Kerry margin of 8%? My brief reply: if you look carefully at all available polls, the race is closer than this single poll indicates. Consider the following.
Imagine that the race were perfectly tied and the margin of error were 4 points. In this case six measurements of the Bush-Kerry margin could easily be: Kerry +2, Bush +2, tie, Bush +6, Kerry +6, Kerry +1. Add the fact that the CNN/USA Today/Gallup poll is somewhat favorable to Bush compared with other polls, and one can see the problems with interpreting any one poll. If a national horserace summary is required for the sake of curiosity, then looking at an average (here is more data) or a median is better. If one does this, Bush is currently about 2 points up on Kerry among decided voters.
This brings me to the biggest point of all: undecided voters are not counted in point spreads, yet history suggests that most of them vote against the incumbent. This suggests that Bush's true threshold separating victory from defeat is about 49%; he is currently slightly below that. This is the big story among pollwatchers this week. For a discussion see these L.A. Times and CNN articles.
Colorado Democratic Senate candidate Ken Salazar has come out clearly against Amendment 36, the electoral vote splitting initiative. It seems likely to fail.
I have been asked to evaluate a tactical-voting idea proposed by Nader supporters, votepair.org. The idea is for swing state voters to agree to change their Nader vote in exchange for a vote change in a non-swing state. Nader is a much smaller factor this year than in 2000, but it is still worth considering how much value you get for your trade. (This relies on the same calculation that I did on Wednesday the 13th to ask where you should get out the vote.)
Regarding Nader: Today the NY Times says that Nader is a threat to the Democrats. Possibly, but this is not supported by the data! In the nine states listed in the article, I looked at 20 polls since October 1 with both two-way results (Kerry-Bush) and three-way results (Kerry-Bush-Nader). Nader does not change the outcome in any of these - and several are ties.
The Washington Post hosts an online chat with Charlie Cook. This is excellent, a treat. He slams on the quality of polling information available on the Internet. Despite being a purveyor, I agree with him. How's that for mind-bending.
The effects of the last debate won't show up until next week. Polls typically take at least two days to complete, and pollsters usually start fresh after a big event. This is why results after the second debate are only trickling in now. In the meantime, more people think Kerry won than think Bush won the third debate, by between 1 and 14 points (CBS 39-25, ABC 42-41, CNN/Gallup 52-39, Democracy Corps 41-36). In addition to overall numbers, Kerry was favored among undecideds by 14 points (CBS), and among independents by 7 points (ABC) or 20 points (CNN/Gallup). Among independents in battleground states, where it matters most, the margin was 9 points (Democracy Corps [D]). A summary of polls for all three debates can be found here.
Lately I have been asked what-if questions (What if Bush wins Ohio? What if Kerry wins Wisconsin?) I have three ways of answering this type of question. The last answer may be of practical use in guiding your activism!
1. Flipping states: How much is the win probability affected by guaranteeing a given state? If Kerry wins Florida, then his overall win probability today jumps to 83% (five-to-one odds). If Bush wins Ohio then his win probability is 87% (eight-to-one odds).
2. Shifting the margin: What is the benefit of changing the margin by one point? You could imagine a campaign strategist making use of this to help decide where to place ads. For both candidates the three best states are Ohio, Florida and Pennsylvania. No surprises there.
3. Hitting the streets: How much do you affect the election by going somewhere to get out the vote? The way to do this calculation is to see how much the Electoral College win probability is changed by incrementing a state's margin by some fraction F, where F is inversely proportional to the state's voting population. This is because as an individual, you can only get out a finite number of votes.
Today, the best states to go to, in descending order, are: Iowa, Ohio, Nevada, and Florida. Things change a little bit if the margin is different from the estimate (for instance towards Kerry because of the incumbent rule as originated by Guy Molyneux and reviewed by the Mystery Pollster and Mark Shields), but the top four states always include Ohio and Nevada. Why Nevada? Nevada is a near-tossup and has a disproportionately high share of electoral votes.
As previously mentioned, I have been looking at the party-ID numbers in the Gallup data. I have found evidence that party ID is not fixed over time. The Gallup poll internal numbers contain the fraction of voters who call themselves Republican, Democratic, or independent. The average GOP fraction is 39%, but fluctuates. The fluctuation has been the source of much discussion and is said to be too high. As it turns out, the amount of fluctuation can be predicted from binomial statistics if the fraction of Republicans (for instance) is fixed over time. The expected standard deviation is sqrt(r*(1-r)/ N), where r is the fraction 0.39 and N is the number of people per poll, about 1000. These numbers predict a standard deviation of 1.5%. From Gallup's data, the actual standard deviation is 2.9%, almost twice this. This suggests that Gallup's way of measuring party-ID shifts over time. This supports the defense by Gallup that weighting by party ID distorts the result.
However, using unweighted data has its own problem, namely that the sample may be consistently biased in one direction or the other. In 2000, Rasmussen did not weight and predicted a margin that was 9 points more favorable to Bush than the final outcome. This is the accusation currently being made against Gallup.
But the cure may be as bad as the disease, as exemplified by Rasmussen's new approach. Rasmussen now weights, and now his presidential tracking poll fluctuates very little. Because party ID and preferred candidate (Bush/Kerry) are strongly correlated, this means that his weighting procedure will always work to reduce the margin of the leading candidate. This may explain why his poll is so stable - statistically, too stable to be right. In recent 3-day tracking data (analyzing every third day) the standard deviation of 0.7% (random fluctuation alone predicts an SD of 1.6%).
The real problem with weighting is as follows: The horserace result depends on assumptions on party ID. If these covary with sentiment, then real changes will be filtered out, and it will be very hard to learn from weighted data on who is ahead, a basic fact we want from polls. We can see an example of this today because a recent poll from Zogby shows little change from the previous poll.
Therefore I currently think that both weighting by party ID (Rasmussen now) and not weighting at all (Gallup now, Rasmussen in 2000) have serious problems. A better way to weight would be to use a question or questions with fixed answers, such as "Who did you vote for in the last election, Bush, Gore or Nader?" Time magazine does this, but does not weight. Of course, the unreliability of memory is a problem. Zogby Interactive does a sensible version of this: party ID is asked at a different time than candidate preference, which might de-link the variables. If anyone knows of other organizations that go beyond simple party-ID-weighting, please let me know.
The median electoral vote (EV) estimate is very sensitive to swings in reported opinion because of the winner-take-all mechanism of awarding EV. Under near-tie conditions I find that the change is about 30 EV per 1-point change in the popular margin. Therefore Kerry's approximately 110-EV slide since mid-August represents a 4-point swing, equivalent to 2% of voters switching from Kerry to Bush. October 5 and 9 corrections: Looking at the numbers more carefully, from August 1 to mid-September the national popular margin swung by about 8 points, 4% of voters switching. This works out to 12-15 EV gain for one candidate per 1-point change in margin or turnout. For comparison, past EV outcomes were 2000 (Bush) 271-266, 1996 (Clinton) 379-159, 1992 (Clinton) 370-168, 1988 (Bush elder) 426-111-1, 1984 (Reagan) 525-13, 1980 (Reagan) 489-49, 1976 (Carter) 297-240-1. October 9: Historically, the Electoral College margin has shown, on average, a 29 EV margin per 1% popular margin. This is consistent with my calculations this year. Since June, neither Kerry or Bush has gotten much past 320 EV, demonstrating that this is the close race that both campaigns have predicted all along.
In addition to the final polls, the outcome will ultimately be effectively adjusted by three big factors: (a) undecideds, (b) new voter registration, and (c) turnout. Undecideds usually break for the challenger, though this is not certain. Newly registered voters should in principle be reflected in polls, though how many of them pass likely-voter criteria is unclear. Turnout is a big unknown (though a known one). In 2000 Democrats did better than expected from pre-election polls. In 2002, Republicans did better. This year, unusual levels of progressive activism would seem to favor Democrats. But prediction is hard, especially of the future.
Relevant to the current Gallup controversy: Here is a table of Gallup national presidential polls, along with Party ID statistics for each poll. The GOP-Democratic margin in the poll correlates quite closely with the Bush-Kerry margin. In fact, the correlation coefficient is 0.73 (r^{2}=0.53, P<0.001). Put into lay terms, this means that Party ID gap and Bush-Kerry margin vary together, and variation in one can account for over half the variation in the other. A linear fit between the two is near 1: on average, every extra Republican in the sample added one to Bush's margin.
This seems consistent with the idea that how much Gallup samples from each group (Republicans, Indepdendents, Democrats) affects poll outcome. But could it be the other way around: could voter sentiment affect self-reported party ID? One test of this is to see if the group sizes fluctuate as much as would be expected by chance. For a sample of 1000 voters composed of 39% R, 34% D, and 25% I (the average of all those polls), the percentage of R's would be expected to have a standard deviation of 1.5%. The actual SD is almost twice as large, 2.9%. Therefore party ID does vary more than sampling error would suggest. Maybe it varies with sentiment. Or maybe conditions at Gallup change (for instance, the time of day and week that calls were made). Hard to know. One thing is clear: their average party ID breakdown does not match known values (35% R, 38-39% D, 26% I).
You can use the bias calculation to estimate where things are headed. If you think turnout will boost your candidate by N points, add that. If you think that one candidate will gain X points at the expense of the other, add 2*X. For instance, if turnout will increase Kerry's vote by 2 points, but Bush will pick up 1.5% of voters from Kerry, then the bias is 2 - (1.5 * 2) = -1%, or 1% to Bush.
For margin of error junkies: Rachel Findley pointed out this comparison of polls to election outcomes, which finds that only 84% of election outcomes fall within the reported 95% confidence interval. This discrepancy allows a way to estimate polling errors that go beyond sampling error. If the additional error is normally distributed, an appropriate correction would be to increase the reported margin of error by a factor of 1.4.
However, this correction does not apply to my calculation because instead of relying on reported MoE, I use inter-poll data to make an independent estimate of variance.
These calculations are based on state polls from many polling organizations (data sources). The primary source is Race 2004, which emphasizes likely voters (LV). Whether Nader is included depends on his ballot status in that state; if unknown then he is included. Polling organizations that provide rolling averages (Rasmussen FL, MI, MN, OH, PA) are updated twice a week. The data are fed through a MATLAB script to mathematically compute all of the above results.
The first step is to calculate the probability of winning a state, taking into account the variability of polls. This is done by calculating simple statistics on the polls: average and standard error of the mean (SEM). This is then converted to a probability of a win using the normal distribution (bell-shaped curve).
The second step of the calculation is complex: it calculates the probability of every possible outcome. For 17 states the total number of possibilities is 2^17 = 131,072. Adding Colorado, Tennessee, North Carolina and Virginia makes over 2 million possibilities. In order to reduce computing time, probabilities less than 0.1% or greater than 99.9% are classified as certain outcomes. Each possibility corresponds to a different number of electoral votes (EV).
Those are then tabulated to come up with a 50th-percentile (expected) outcome, as well as a 95-percent confidence interval. The 95-percent confidence interval is particularly useful because, like the famous Margin of Error (MoE), it gives the range of outcomes that would occur 95 percent of the time based on the available information. Note that this confidence interval is very similar to the 50th-percentile outcome from a 1-point bias towards Bush or towards Kerry. Note, October 21: The confidence interval varies in size somewhat. When large states (such as OH, WI, FL) are toss-ups the confidence band can be up to twice as large.
Although this calculation takes into account the variability of polls, it is important to note what it does not do. It is integrated over the last three polls (mostly 1-4 weeks), so fast swings do not show up. It does not reject any polls, nor does it account for potential bias or predict future opinion shift in any way.
October 6: I am now using a far faster way of calculating the probability distribution in closed form (see discussion). Thanks to Lee of Quant Consulting.
Polls are unfiltered and equally weighted, in part because selecting data leads to unintended biases. Therefore even though some polling organizations give demonstrable and consistent outlier results, such as state-level Fox News polls (example), Fairleigh Dickinson Public Mind, and the Badger Poll, all polls are still included. I have also included Zogby Interactive polls, which have relatively untested methods but do not show measurable bias in either direction from the average.
However, even when all of the above polls are excluded the result is virtually identical. Thus the method can be tailored but is also robust enough to give a reasonable answer even with no selection of data. For a more full discussion of my methods, see this DailyKos thread.
These calculations would be affected if there is an overall poll bias, which can have a large effect in a close race. Bias could happen if polling methods do not accurately sample actual voting patterns. However, in 2000, Ryan Lizza at The New Republic compiled state polls. On the day before the election, that compilation indicated that the outcome would hinge on Florida. This matches what happened, arguing against major built-in biases in state polls.
Other factors may have an effect of unknown size, such as increased motivation by Democrats or the possibility that undecided voters will break against the incumbent.
A key measure of the current closeness of the race is the Popular Meta-Margin (a.k.a. Swing Index). This is the across-the-board percentage shift in opinion (or poll bias) that would be needed to make the electoral college an exact toss-up? This is analogous to the popular margin in national polls, but is more relevant to what it would take in terms of real electoral mechanisms.
Bias occurs if (a) polling organizations give skewed results (on Election Eve 2000, they favored Bush by 2.5%) or if (b) one side turns its voters out better or worse than predicted. The closeness of the race for the last few months (less than 3% in either direction) indicates a heavy role for registration of new voters and election-day turnout.
You can use the bias calculation to estimate where things are headed. If you think turnout efforts will boost your candidate by N points, add that. If you think that one candidate will gain X points at the expense of the other, add 2*X.
For instance, if you predict that turnout will increase Kerry's vote by 2 points, but Bush will pick up 1.5% of voters from Kerry, then the bias to use is 2 - (1.5 * 2) = -1%, or 1% to Bush.
My academic specialties are biophysics and neuroscience. In these fields I make heavy use of probability and statistics in analyzing complex experimental data, and have published many papers using these approaches. Polling provides an interesting everyday example.
I originally did this calculation to help think about how to allocate my campaign contributions. I believe that one can make the biggest difference by donating at the margin, where probabilities for success are 20-80%. To read a discussion click here. This site now gets over 10,000 hits per day.
In addition to Kerry's race, the Senate is within reach, My recommendations for Democratic donations are listed on my ActBlue page.
For those of you wanting to reinforce the national election (to see why, go to the bias analysis above), I recommend the voter registration and turnout organization America Coming Together. For the optimists there is also the DCCC.
To point out the obvious, the converse interpretation of these calculations for Republicans is to direct resources toward the White House and the Senate.
Thanks to Drew Thaler for the site facelift and for the interactive map!