``Parametric and Nonparametric Bayesian Models for Ecological Inference in 2 x 2 Tables''

 

  Abstract

The ecological inference problem arises when making inferences about individual behavior from aggregate data. Such a situation is frequently encountered in the social sciences and epidemiology. In this article, we propose a Bayesian approach based on data augmentation. We formulate ecological inference in $2 \times 2$ tables as a missing data problem where only the weighted average of two unknown variables is observed. This framework directly incorporates the deterministic bounds, which contain all information available from the data, and allow researchers to incorporate the individual-level data whenever available. Within this general framework, we first develop a parametric model. We show that through the use of an $EM$ algorithm, the model can formally quantify the effect of missing information on parameter estimation. This is an important diagnostic for evaluating the degree of aggregation effects. Next, we introduce a nonparametric Bayesian model using a Dirichlet process prior to relax the distributional assumption of the parametric model. Through simulations and an empirical application, we evaluate the relative performance of our models and other existing methods. We show that in many realistic scenarios, aggregation effects are so severe that more than half of the information is lost, yielding estimates with little precision. We also find that our nonparametric model generally outperforms parametric models. C-code, along with an R interface, is publicly available for implementing our Markov chain Monte Carlo algorithms to fit the proposed models. (Last Revised November 30, 2004)

© Kosuke Imai
  Last modified: Wed Aug 3 23:32:05 EDT 2005