Marc Ratkovic

Department of Politics

Princeton University

Variable Selection Methods

(placeholder)

Bicoordinate Descent for the LASSO

We propose an estimator for the LASSO that converges faster than the standard coordinatewise descent algorithm.

With In Song Kim and John Londregan.

Estimating Treatment Effect Heterogeneity in Randomized Program Evaluation.

When evaluating the efficacy of social programs and medical treatments using randomized experiments, the estimated overall average causal effect alone is often of limited value and the researchers must investigate when the treatments do and do not work. Indeed, the estimation of treatment effect heterogeneity plays an essential role in (1) selecting the most effective treatment from a large number of available treatments, (2) ascertaining subpopulations for which a treatment is effective or harmful, (3) designing individualized optimal treatment regimes, (4) testing for the existence or lack of heterogeneous treatment effects, and (5) generalizing causal effect estimates obtained from an experimental sample to a target population. In this paper, we formulate the estimation of heterogeneous treatment effects as a variable selection problem. We propose a method that adapts the Support Vector Machine classifier by placing separate sparsity constraints over the pre-treatment parameters and causal heterogeneity parameters of interest. The proposed method is motivated by and applied to two well-known randomized evaluation studies in the social sciences. Our method selects the most effective voter mobilization strategies from a large number of alternative strategies, and it also identifies the characteristics of workers who greatly benefit from (or are negatively affected by) a job training program. In our simulation studies, we find that the proposed method often outperforms some commonly used alternatives.

With Kosuke Imai.

Annals of Applied Statistics, 2013, Vol. 7, No. 1, pp. 443-470.

Finding Jumps in Otherwise Smooth Curves

Many social processes are stable and smooth in general, with discrete jumps. We develop a sequential segmentation spline method that can identify both the location and the number of discontinuities in a series of observations with a time component, while fitting a smooth spline between jumps, using a modified Bayesian Information Criterion statistic as a stopping rule. We explore the method in a large-n, unbalanced panel setting with George W. Bush’s approval data, a small-n time series with median DW-NOMINATE scores for each Congress over time, and a series of simulations. We compare the method to several extant smoothers, and the method performs favorably in terms of visual inspection, residual properties, and event detection. Finally, we discuss extensions of the method.

With Kevin Eng.

Political Analysis, 2010, Vol. 18, pp. 57-77.

We introduce a Bayesian method, LASSOplus, that unifies recent contributions in the sparse modelling literatures, while substantially extending upon pre-existing estimators in terms of both performance and flexibility. Unlike existing Bayesian variable selection methods, LASSOplus both selects and estimates effects, while returning estimated confidence intervals among discovered effects. Furthermore, we show how LASSOplus easily extends to modeling repeated observations, and permits a simple Bonferroni correction to control coverage on confidence intervals among discovered effects. We situate the LASSOplus in the literature on exploring sub-group effects, a topic that often leads to a proliferation of estimation parameters. We  also offer a simple pre-processing step that draws on recent theoretical work to estimate higher-order effects that can be interpreted independent of their lower-order terms. A simulation study illustrates the method's performance relative to several existing variable selection methods. Application to an existing study of support for climate treaties illustrates the method's ability to discover substantively relevant effects.

With Dustin Tingley.

Implemented through sparsereg package in the R programming lanuage (link).

Sparse Estimation with Uncertainty: Subgroup Analysis in Large Dimensional Designs