Meta-Analysis of State Polls

Monday, October 11, 2004 at 9:00AM EDT

Median expected outcome*: Kerry 269 EV, Bush 269 EV (Map) (Interactive) (Trends to 9/25)

95% confidence band: Kerry 246-301 EV (Kerry >=270EV: 46%)

Popular Meta-Margin (explanation): Bush leads Kerry by 0.1%

*If Colorado ballot initiative passes, transfer 4 EV from Bush to Kerry.

Pollcalc news:

The map is upgraded to be interactive. This both speeds my efforts and allows you to play out your favorite scenarios. Once again I am indebted to Drew Thaler, this time for modifying this terrific tool. If you have problems running it, write me with what browser and operating system you are using, and (if you know how to find it) your Java console output (i.e. Java messaages). Note that he has added the Upper Peninsula of Michigan - citizens up there are now safe and dry.

3:00PM: The interactive map is crashing some browsers, including Firefox, so it's gone for a little while. If you want to see it go to this page.

Lucia B., a student in Operations Research and Finance, has started to help with data updates and analysis. Welcome and thanks to Lucia!

Commentary (Recent comments)

The new polls today are from Opinion Research and the five Rasmussen tracking polls (FL, MI, MN, OH, PA). Kerry drops slightly, mainly because of dropping polls more than seven days old.

Although the median is a tie, this is just the midpoint of likely outcomes. The actual probability of an exact electoral tie today is only 4%. The Popular Meta-Margin, Bush +0.1%, quantifies how close things are. As I have said over and over, it's currently all about turnout! All key polls were taken in the days after the first debate. A strong predictor of victory is winning two or three of the following states: FL, OH, and PA. Kerry is ahead in PA (6 polls), Bush is ahead in FL (6 polls), and OH is tied (4 polls). [sources]

Over at Race 2004, Stephen points out that Colorado Democrats tend to support Amendment 36, while Republicans are opposed. He thinks this might become a tricky choice for partisans if sentiment shifts towards Kerry. I disagree. Optimally, Democrats and Republicans should stick with their preferences even if Colorado swings into the Kerry column. Here is why.

States tend to move in the same direction in opinion, and Colorado (Bush +5.7% in recent polls) tends to be a lower-likelihood state for Kerry than Ohio (Kerry +0.02%) and Florida (Bush +2.7%). Therefore, if the national election is close enough for Colorado to make a difference, it is likely to go for Bush. In this case, Amendment 36 will provide four EV for Kerry, votes that might be critical. Conversely, if opinion swings enough for Colorado to go for Kerry, then Ohio or Florida will be the deciding state, and losing four electoral votes will not matter. Therefore, passage of Amendment 36 is most likely to either help the Democrats or have no effect.

Relevant to the Cheney-Edwards debate: campaign satire, possibly amusing to both sides.

This site is hard work.

The potential effects of differential turnout, shift in opinion, or polling bias are as follows. (It is my opinion that registration/turnout activities will generate a 2-point effective swing toward Kerry on Election Day.)

3 point swing to Kerry: Kerry 311 EV, Bush 227 EV, Kerry win 99%.
2 point swing to Kerry: Kerry 295 EV, Bush 243 EV, Kerry win 93%.
1 point swing to Kerry: Kerry 284 EV, Bush 254 EV, Kerry win 74%.
1 point swing to Bush: Kerry 257 EV, Bush 281 EV, Bush win 80%.
2 point swing to Bush: Kerry 249 EV, Bush 289 EV, Bush win 94%.
3 point swing to Bush: Kerry 240 EV, Bush 298 EV, Bush win 99.2%.

Although the calculation is unbiased, I am not. I am a Democrat. To see a list of races I consider critical, see my ActBlue page. My advice to all voters (including Republicans) is the same: Go to battleground states. Register voters. Make phone calls and knock on doors (a very effective strategy) to canvass for voters. Vote absentee or vote early (online resource), and on Election Day, work to get out the vote.

State-by-State Probabilities

Current percentage probabilities of a Kerry win in each battleground state, computed from the last three polls. States in boldface had a new poll completed and released since Saturday. The probabilities are calculated assuming that the SEM cannot go below 2%. Click on a state to view a tabulation of most of the polls. Some of the others come from these data sources (some are subscription-only). Other sources are electoral-vote.com and RealClearPolitics. All data are visible in this MATLAB script.

Rank order of the battleground states. States currently in play in the 20-80% probability range, which indicates a near-tie, are in bold.

Democratic <- MI/WA/MN/OR/NJ/PA/NH(95-100%) / IA / ME / NM / WI / OH / NV / MO / FL / TN / AR/CO/AZ/VA/NC/WV(0-5%) -> Republican

The map below shows states that, individually, each candidate would have a greater than 50% chance of winning if the election were held today. This is a different calculation from the median EC projection above. A map cannot easily show compound events (for example, when a candidate pulls out a surprise victory in one state but loses another), but the above probability projections take compound events into account. Thus the EC totals on the map may differ from what's given above. The closer a state is to a tossup, the closer it will be to white.

State-by-state electoral map

This is a history of the calculation. For this calculation each poll is assigned to the last date on which polling was done. The marked events are inspired by a similar graph by electoral-vote.com. In my graph, the effect of events is clearer because I use polling margins and because I average over three polls. Fahrenheit 9/11, adding John Edwards to the ticket, and the Democratic convention seemed to have measurable effects within a few days. The passing of Ronald Reagan and the assault on Kerry's war heroism did not. The last update was September 25.

History of meta-analysis over time

Selected comments from the author (all comments)

October 6, 2004 (noon)

Possible clues to whether the bounce has occurred in key states: Florida, Iowa, and Ohio polls show Kerry ahead for the first time in weeks, though by tiny margins. Other polls are generally in the direction of Kerry. Nationally, Kerry's bounce is 5 +/- 2 points (mean +/- SEM), comparing the six polls immediately after the debate with the six polls before. [National polls] This is much larger than the current Meta-Margin, suggesting the potential for a large change in EV standings.

I am now setting the floor for standard error of the mean at 2 points instead of 1, in order to more realistically capture uncertainty in cases where polls happen to be near one another. This does not change things much, except for the probability calculations. Which you are supposed to take with a big grain of salt, remember?

Usually over the last few months, these indices have been mismatched: at a certain point in opinion swing, Bush could very well win the popular vote but lose the Electoral College. In other words, this year the Electoral College mechanism seems to favor Kerry.

Relevant to the Cheney-Edwards debate: campaign satire, possibly amusing to both sides. Cheney is less gaffe-prone than Bush, but he is still a skillful deceiver.

Electoral-vote.com is back to using only one poll per state. Bad idea!

October 5, 2004

As previously mentioned, I have been looking at the party-ID numbers in the Gallup data. I have found evidence that party ID is not fixed over time. The Gallup poll internal numbers contain the fraction of voters who call themselves Republican, Democratic, or independent. The average GOP fraction is 39%, but fluctuates. The fluctuation has been the source of much discussion and is said to be too high. As it turns out, the amount of fluctuation can be predicted from binomial statistics if the fraction of Republicans (for instance) is fixed over time. The expected standard deviation is sqrt(r*(1-r)/ N), where r is the fraction 0.39 and N is the number of people per poll, about 1000. These numbers predict a standard deviation of 1.5%. From Gallup's data, the actual standard deviation is 2.9%, almost twice this. This suggests that Gallup's way of measuring party-ID shifts over time. This supports the defense by Gallup that weighting by party ID distorts the result.

However, using unweighted data has its own problem, namely that the sample may be consistently biased in one direction or the other. In 2000, Rasmussen did not weight and predicted a margin that was 9 points more favorable to Bush than the final outcome. This is the accusation currently being made against Gallup.

But the cure may be as bad as the disease, as exemplified by Rasmussen's new approach. Rasmussen now weights, and now his presidential tracking poll fluctuates very little. Because party ID and preferred candidate (Bush/Kerry) are strongly correlated, this means that his weighting procedure will always work to reduce the margin of the leading candidate. This may explain why his poll is so stable - statistically, too stable to be right. In recent 3-day tracking data (analyzing every third day) the standard deviation of 0.7% (random fluctuation alone predicts an SD of 1.6%).

The real problem with weighting is as follows: The horserace result depends on assumptions on party ID. If these covary with sentiment, then real changes will be filtered out, and it will be very hard to learn from weighted data on who is ahead, a basic fact we want from polls. We can see an example of this today because a recent poll from Zogby shows little change from the previous poll.

Therefore I currently think that both weighting by party ID (Rasmussen now) and not weighting at all (Gallup now, Rasmussen in 2000) have serious problems. A better way to weight would be to use a question or questions with fixed answers, such as "Who did you vote for in the last election, Bush, Gore or Nader?" Time magazine does this, but does not weight. Of course, the unreliability of memory is a problem. Zogby Interactive does a sensible version of this: party ID is asked at a different time than candidate preference, which might de-link the variables. If anyone knows of other organizations that go beyond simple party-ID-weighting, please let me know.

October 2, 2004

The median electoral vote (EV) estimate is very sensitive to swings in reported opinion because of the winner-take-all mechanism of awarding EV. Under near-tie conditions I find that the change is about 30 EV per 1-point change in the popular margin. Therefore Kerry's approximately 110-EV slide since mid-August represents a 4-point swing, equivalent to 2% of voters switching from Kerry to Bush. October 5 and 9 corrections: Looking at the numbers more carefully, from August 1 to mid-September the national popular margin swung by about 8 points, 4% of voters switching. This works out to 12-15 EV gain for one candidate per 1-point change in margin or turnout. For comparison, past EV outcomes were 2000 (Bush) 271-266, 1996 (Clinton) 379-159, 1992 (Clinton) 370-168, 1988 (Bush elder) 426-111-1, 1984 (Reagan) 525-13, 1980 (Reagan) 489-49, 1976 (Carter) 297-240-1. October 9: Historically, the Electoral College margin has shown, on average, a 29 EV margin per 1% popular margin. This is consistent with my calculations this year. Since June, neither Kerry or Bush has gotten much past 320 EV, demonstrating that this is the close race that both campaigns have predicted all along.

In addition to the final polls, the outcome will ultimately be effectively adjusted by three big factors: (a) undecideds, (b) new voter registration, and (c) turnout. Undecideds usually break for the challenger, though this is not certain. Newly registered voters should in principle be reflected in polls, though how many of them pass likely-voter criteria is unclear. Turnout is a big unknown (though a known one). In 2000 Democrats did better than expected from pre-election polls. In 2002, Republicans did better. This year, unusual levels of progressive activism would seem to favor Democrats. But prediction is hard, especially of the future.

September 29, 2004

Relevant to the current Gallup controversy: Here is a table of Gallup national presidential polls, along with Party ID statistics for each poll. The GOP-Democratic margin in the poll correlates quite closely with the Bush-Kerry margin. In fact, the correlation coefficient is 0.73 (r2=0.53, P<0.001). Put into lay terms, this means that Party ID gap and Bush-Kerry margin vary together, and variation in one can account for over half the variation in the other. A linear fit between the two is near 1: on average, every extra Republican in the sample added one to Bush's margin.

This seems consistent with the idea that how much Gallup samples from each group (Republicans, Indepdendents, Democrats) affects poll outcome. But could it be the other way around: could voter sentiment affect self-reported party ID? One test of this is to see if the group sizes fluctuate as much as would be expected by chance. For a sample of 1000 voters composed of 39% R, 34% D, and 25% I (the average of all those polls), the percentage of R's would be expected to have a standard deviation of 1.5%. The actual SD is almost twice as large, 2.9%. Therefore party ID does vary more than sampling error would suggest. Maybe it varies with sentiment. Or maybe conditions at Gallup change (for instance, the time of day and week that calls were made). Hard to know. One thing is clear: their average party ID breakdown does not match known values (35% R, 38-39% D, 26% I).

Gallup Party ID bias

September 27, 2004

This week's focus is registration of overseas voters. Seven million American citizens live abroad, and most do not vote. This year they could be decisive. The first deadline for applications to be received by county registrars is October 2, and thirty states have deadlines October 5 or earlier. So people must hurry!

Any US citizen living abroad can register and vote, even if it's been decades since they lived in the US, even if they never voted, even if they were born to US citizens and never lived in the US. For them, here is a registration and application form. Other useful web sites are www.overseasvote2004.com and the Pentagon's site. Finally, expatriates who do not receive an absentee ballot can use a Federal Write-In Ballot. Otherwise they must depend on the state and the mail services to get them the ballot in time.

Within the US, the registration effort has been intense (read about efforts in FL and OH and in CO). In these states I estimate that efforts have increased registration of Democrats over Republicans by about 0.4% of total voters. To do your part for your side, here are forms.

August 23, 2004

You can use the bias calculation to estimate where things are headed. If you think turnout will boost your candidate by N points, add that. If you think that one candidate will gain X points at the expense of the other, add 2*X. For instance, if turnout will increase Kerry's vote by 2 points, but Bush will pick up 1.5% of voters from Kerry, then the bias is 2 - (1.5 * 2) = -1%, or 1% to Bush.

August 20, 2004

For margin of error junkies: Rachel Findley pointed out this comparison of polls to election outcomes, which finds that only 84% of election outcomes fall within the reported 95% confidence interval. This discrepancy allows a way to estimate polling errors that go beyond sampling error. If the additional error is normally distributed, an appropriate correction would be to increase the reported margin of error by a factor of 1.4.

However, this correction does not apply to my calculation because instead of relying on reported MoE, I use inter-poll data to make an independent estimate of variance.

Methods

These calculations are based on state polls from many polling organizations (data sources). The primary source is Race 2004, which emphasizes likely voters (LV). Whether Nader is included depends on his ballot status in that state; if unknown then he is included. Polling organizations that provide rolling averages (Rasmussen FL, MI, MN, OH, PA) are updated twice a week. The data are fed through a MATLAB script to mathematically compute all of the above results.

The first step is to calculate the probability of winning a state, taking into account the variability of polls. This is done by calculating simple statistics on the polls: average and standard error of the mean (SEM). This is then converted to a probability of a win using the normal distribution (bell-shaped curve).

The second step of the calculation is complex: it calculates the probability of every possible outcome. For 17 states the total number of possibilities is 2^17 = 131,072. Adding Colorado, Tennessee, North Carolina and Virginia makes over 2 million possibilities. In order to reduce computing time, probabilities less than 0.1% or greater than 99.9% are classified as certain outcomes. Each possibility corresponds to a different number of electoral votes (EV).

Those are then tabulated to come up with a 50th-percentile (expected) outcome, as well as a 95-percent confidence interval. The 95-percent confidence interval is particularly useful because, like the famous Margin of Error (MoE), it gives the range of outcomes that would occur 95 percent of the time based on the available information. Note that this confidence interval is very similar to the 50th-percentile outcome from a 1-point bias towards Bush or towards Kerry.

Although this calculation takes into account the variability of polls, it is important to note what it does not do. It is integrated over the last three polls (mostly 1-4 weeks), so fast swings do not show up. It does not reject any polls, nor does it account for potential bias or predict future opinion shift in any way.

October 6: I am now using a far faster way of calculating the probability distribution in closed form (see discussion). Thanks to Lee of Quant Consulting.

Poll selection

Polls are unfiltered and equally weighted, in part because selecting data leads to unintended biases. Therefore even though some polling organizations give demonstrable and consistent outlier results, such as state-level Fox News polls (example), Fairleigh Dickinson Public Mind, and the Badger Poll, all polls are still included. I have also included Zogby Interactive polls, which have relatively untested methods but do not show measurable bias in either direction from the average.

However, even when all of the above polls are excluded the result is virtually identical. Thus the method can be tailored but is also robust enough to give a reasonable answer even with no selection of data. For a more full discussion of my methods, see this DailyKos thread.

Bias

These calculations would be affected if there is an overall poll bias, which can have a large effect in a close race. Bias could happen if polling methods do not accurately sample actual voting patterns. However, in 2000, Ryan Lizza at The New Republic compiled state polls. On the day before the election, that compilation indicated that the outcome would hinge on Florida. This matches what happened, arguing against major built-in biases in state polls.

Other factors may have an effect of unknown size, such as increased motivation by Democrats or the possibility that undecided voters will break against the incumbent.

A key measure of the current closeness of the race is the Popular Meta-Margin (a.k.a. Swing Index). This is the across-the-board percentage shift in opinion (or poll bias) that would be needed to make the electoral college an exact toss-up? This is analogous to the popular margin in national polls, but is more relevant to what it would take in terms of real electoral mechanisms.

Bias occurs if (a) polling organizations give skewed results (on Election Eve 2000, they favored Bush by 2.5%) or if (b) one side turns its voters out better or worse than predicted. The closeness of the race for the last few months (less than 3% in either direction) indicates a heavy role for registration of new voters and election-day turnout.

Predicting the future

You can use the bias calculation to estimate where things are headed. If you think turnout efforts will boost your candidate by N points, add that. If you think that one candidate will gain X points at the expense of the other, add 2*X.

For instance, if you predict that turnout will increase Kerry's vote by 2 points, but Bush will pick up 1.5% of voters from Kerry, then the bias to use is 2 - (1.5 * 2) = -1%, or 1% to Bush.

Site History

My academic specialties are biophysics and neuroscience. In these fields I make heavy use of probability and statistics in analyzing complex experimental data, and have published many papers using these approaches. Polling provides an interesting everyday example.

I originally did this calculation to help think about how to allocate my campaign contributions. I believe that one can make the biggest difference by donating at the margin, where probabilities for success are 20-80%. To read a discussion click here. This site now gets over 10,000 hits per day.

In addition to Kerry's race, the Senate is within reach, My recommendations for Democratic donations are listed on my ActBlue page.

For those of you wanting to reinforce the national election (to see why, go to the bias analysis above), I recommend the voter registration and turnout organization America Coming Together. For the optimists there is also the DCCC.

To point out the obvious, the converse interpretation of these calculations for Republicans is to direct resources toward the White House and the Senate.

Thanks to Drew Thaler for the site facelift!

Other election resources