Predicting and Dissecting the Seats-Votes Curve in the 2006 U.S. House Election (with Andrew Gelman and Jamie Chandler). 2008. PS: Political Science & Politics. 41(1):139-145.



Replication Information

Abstract: The Democrats' victory in the 2006 election has been compared to the Republicans' in 2004. But the Democrats actually did a lot better in terms of the vote. The Democrats received 54.8% of the average district vote for the two parties in 2006, whereas the Republicans only averaged 51.6% in 1994. The 2006 outcome for the Democrats is comparable to their typical vote shares as the majority party in the decades preceding the 1994 realignment. Nevertheless, the size of the Democrats' victory in the 2006 House elections has obscured the sizable structural disadvantages they faced heading into the elections. In this paper we document the advantages the Republicans had, examine how and to what extent the Democrats overcame it, and offer predictions as to whether the results of the 2006 election leveled the electoral playing field for 2008. Our calculations showed that the Democrats needed at least 52% of the vote to have an even chance of taking control of the House of Representatives.

Prior to the election we estimated the seats-votes curve for 2006 by constructing a model to predict the 2006 election from 2004, and then validating the method by applying it to previous elections (predicting 2004 from 2002, and so forth). We found that the Democrats in 2006 were always destined to receive fewer seats than their corresponding average vote share. They were able to gain control of the House by winning the largest average district vote by either party since 1990. Has the 2006 election removed the Republicans' structural advantages? While Republicans continue to win more close races, a preliminary analysis of the 2008 election suggests that the switch in incumbency advantage from the Republicans to the Democrats may nevertheless level the electoral playing field.

Click here to download a pdf copy of the paper.

A pre-election version of the paper is available here.


Replication Information


With the data and code described below, researchers can replicate our results and use the data for further study.  Note that all the files referenced below, including csv versions of the datasets, can be found in this zip file.



We used three datasets in the paper: a district-level dataset containing information on every election in each House election from 1946 to 2004; an aggregate-level dataset containing information on the total number of votes and seats gained by each party in the same elections; and a dataset containing information on each district that we used to make predictions for the 2006 election. 

a)     Individual House Races Data, 1946-2004

This dataset, which was given to us by Gary Jacobson, contains various information on every House race from 1946-2004, such as the vote share of the Democratic candidate and incumbency status; complete coding information is available here.  We modified and recoded this data using this Stata do-file.  Coding information for the updated dataset, which we use for the analysis that appears in the paper, is available here.

b)    Aggregate House Data, 1946-2004

This dataset, which was compliled based on data available from the Clerk of the House, contains aggregate information (in terms of seats and votes) for every House election from 1946-2004.  Coding information is available here.

c)     Individual House Race Data for Predicting 2006

This dataset contains information about the 2006 election, including incumbency status lagged vote leading up to the election, along with information about the winner and vote margins in the 2006 election.  Data on 2004 vote shares and incumbency status was based on Jacobson’s data.  Data on incumbency status and retirements was taken from various news sources in the months leading up to the election (see paper for references). And the 2006 election results were supplied to us Walt Borges, who gathered the official certified results of every state, which we then confirmed independently.  Coding information is available here.

Statistical Code

All statistical analysis that appears in the paper was conducted using R.  Complete, annotated code is available here.