Statistical Machine Learning Lab

Princeton University

Research Overview

Effective data analysis at the required scale will be one of the greatest challenges of modern information society. To handle this challenge, we need to understand complexiy--both informational and computational complexity. My research lies at the intersection of modern statistics and machine learning. Especially, I am interested in developing flexible nonparametric and semiparametric methods and apply them on complex scientific datasets. My theoretical research interests include:

  •        Nonparametric Functional Sparsity
  •        Semiparametric Structural Sparsity
  •        Model based Optimization
  •        Large-Scale Calibrated Inference
  •        Theoretical Foundations of High Dimensional Inference
  • My applied research interest is to develop a unified set of computational, statistical, and software tools to extract and interpret significant information from the data collected from a variety of scientific areas. Current projects include
  •        Brain Image Data Analysis via Semiparametric Graphical Models
  •        Genomic Network Analysis and Topic Modeling
  • Postdoc Position Available : one postdoc position is available. Please contact

    Nonparametric Functional Sparsity

    Post Pic

    Accurately estimating high dimensional density or regression function is fundamentally important in statistical learning. To avoid the curse of dimensionality, we assume the target function only depends a small number of variables. Under this functional sparsity assumption, we have developed nonparametric methods for density estimation, regression, classification, and graph estimation. Recently, we are exploring new methods for functional ANOVA models.

    Selected Publications

    CODA: Copula Discriminant Analysis

    Fang Han, Tuo Zhao, and Han Liu

    Journal of Machine Learning Research (JMLR). Volume 14, pp629-671. 2013.

    Forest Density Estimation

    Han Liu, Min Xu, Haijie Gu, Anupam Dasgupta, John Lafferty, and Larry Wasserman

    Journal of Machine Learning Research (JMLR) Vol 12, 907−951. 2011.

    Sparse Additive Models

    Pradeep Ravikumar, John Lafferty, Han Liu, Larry Wasserman

    Journal of the Royal Statistical Society: Series B (Statistical Methodology) (JRSSB), 2009.

    » Learn more

    Semiparametric Structural Sparsity

    Semiparametric models contain both finite- and infinite-dimensional parameters. For inference, interest often only lies in estimating the finite dimensional parameters, while treating the infinite-dimensional parameters as a nuisance. Therefore, one can impose structural sparsity assumptions directly on the finite-dimensional parameters. Such an approach allows us to simultaneously gain statistical efficiency and modeling flexibility.

    Selected Publications

    High Dimensional Semiparametric Bigraphical Model

    Yang Ning and Han Liu

    Biometrika. To appear. 2013.

    High Dimensional Semiparametric Gaussian Copula Graphical Models

    Han Liu, Fang Han, Ming Yuan, John Lafferty, and Larry Wasserman

    The Annals of Statistics, Volume 40, No. 40, pp2293-2326. 2012.

    Sparse Nonparametric Graphical Models

    John Lafferty, Han Liu, and Larry Wasserman

    Statistical Science, Volume 27, No 4, pp519-537. 2012.

    The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

    Han Liu, John Lafferty, and Larry Wasserman

    Journal of Machine Learning Research (JMLR), (10) 2295-2328, 2009.

    » Learn more

    Model based Optimization

    Model based optimization is a new area that lies at the intersection of statistical learning, robust optimization, and stochastic optimization. We apply ideas from modern porbability and statistics to solve large and complex optimization problems. In particular, we develop rigorous theoretical framework to characterize the interaction between informational and computational complexity. Our current research include:

    Selected Publications

    Optimal Computational and Statistical Rates of Convergence for Sparse Nonconvex Learning Problems

    Zhaoran Wang, Han Liu, and Tong Zhang

    On the arXiv:1306.4960. 2013.

    Sparse Covariance Matrix Estimation with Eigenvalue Constraints

    Han Liu, Lie Wang, and Tuo Zhao

    Journal of Computational and Graphical Statistics (JCGS). To appear. 2013.

    Nonparanormal Graph Estimation via Smooth-projected Neighborhood Pursuit

    Tuo Zhao, Kathryn Roeder and Han Liu

    Neural Information Processing Systems (NIPS), 25, 2012.

    » Learn more

    Large-Scale Calibrated Inference

    Post Pic

    Many statistical learning problems can be formulated into a multitask setting in which we want to simultaneously solve many learning subproblems. Calibrated inference automatically adjusts the regularization for each individual task with respect to its noise level or problem design so that it is simultaneously tuning insensitive and achieves an improved finite sample performance.

    Selected Publications

    Multivariate Regression with Calibration

    Han Liu, Lie Wang, and Tuo Zhao

    On the arXiv:1305.2238. 2013.

    TIGER: A Tuning-Insensitive Approach for Optimal Graph Estimation

    Han Liu and Lie Wang

    On the arXiv:1209.2437. 2012.

    » Learn more

    Theoretical Foundations of High Dimensional Inference

    We develop fundamental theory of learning algorithms that exploit hidden structure to overcome the curse of dimensionality when analyzing massive amounts of high dimensional datasets. In paritcular, we focus on complex settings where dependent data and nonconvex formulations arise. These theoretical understandings provide new recipes for designing more effective learning algorithms.

    Selected Publications

    Optimal Rates of Convergence of Transelliptical Component Analysis

    Fang Han and Han Liu

    On the arXiv:1305.6916. 2013.

    Optimal Feature Selection in High-Dimensional Discriminant Analysis

    Mladen Kolar and Han Liu

    On the arXiv:1306.6557. 2013.

    Compressive Network Analysis

    Xiaoye Jiang, Yuan Yao, Han Liu, and Leonidas Guibas

    IEEE Transactions on Automatic Control. To appear. 2013.

    » Learn more

    Modern Scientific Applications

    Post Pic

    Our research has a wide variety of applications, raning from brain image analysis, genomics data analysis, and social media analysis. The data in these fields are usually very high dimensional and complex, which makes flexible nonparametric methods suitable for building accurate predictive models or discovering new scientific facts:

    Selected Publications

    Challenges of Big Data Analysis

    Jianqing Fan, Fang Han, and Han Liu

    On the arXiv:1308.1479. 2013.

    Statistical Analysis of Big Data on Pharmacogenomics

    Jianqing Fan and Han Liu

    Advanced Drug Delivery Reviews. To appear. 2013.

    Optimal Tests of Treatment Effects Using Sparse Linear Programming

    Michael Rosenblum, Han Liu, and En-Hsu Yen

    On the arXiv:1306.0964. 2013.

    Soft Null Hypotheses: A Case Study of Image Enhancement Detection in Brain Lesions

    Haochang Shou, Russell T. Shinohara, Han Liu, Daniel S. Reich, Ciprian M. Crainiceanu

    On the arXiv:1306.5524. 2013.

    » Learn more
    Page 1 of 2

    Reading Group

    We have biweekly reading group. The topics of this semeser include optimization in infinite dimensional space, homotopy algorithm, stochastic convex optimization, random matrix theory, and CUDA for GPU programming.

    Get In Touch

    Department of Operations Research and Financial Engineering
    Sherred Hall 224
    Princeton University, NJ 08544
    Phone: +609 258 1788