Statistical Machine Learning Lab

Princeton University

Research Overview

As a computer scientist and statistician, I use computation and data as a lens to explore science and intelligence. To make progress, I examine this with the point of view provided by the twin windows of modern nonparametric method and probabilistic graphical model. Nonparametric method aims to infer from high dimensional data with weakest possible assumptions, while graphical model provides a unified framework which combines uncertainty (probability theory) and logical structure (graph theory) to model complex, real-world phenomena. Together they provide a powerful tool to handle challenging problems to shed insight on the nature of machine intelligence and if successful would lead to significant applications of the future. My specific research focuses on nonparametric structure learning and representation learning. Success on this research has the potential to revolutionarize the foundation of the second generation of artificial intelligence (i.e., statistical machine learning) and push the frontier of the third generation of artificial intelligence (i.e., deep learning). My theoretical research projects include:

  •        Nonparametric graphical model
  •        Transelliptical modeling and robust Inference
  •        Nonconvex statistical optimization
  •        Post-regularization inference
  •        High dimensional nonparametrics
  •        Fundamental limits of a computational model
  • My applied research interest is to develop a unified set of computational, statistical, and software tools to extract and interpret significant information from the data collected from a variety of scientific areas. Current projects include
  •        Nonparametric graphical models for brain science and genomics
  •        Modern machine learning methods for computational finance
  • Postdoc Position Available : one postdoc position is available. Please contact

    Nonparametric Graphical Model

    Post Pic

    Graphical models have proven to be an extremely useful abstraction in statistical machine learning. The starting point is the graph of a distribution. While usually the graph is assumed given, we are interested in estimating the graph from data. In this project we develop a tractable subfamily of nonparametric graphical models. For example, one approach is named "the nonparanormal," which uses copula methods to transform the variables by monotonic functions, relaxing the fully parametric assumptions made by the Gaussian graphical model. Another approach is to restrict the family of allowed graphs to forest graphs, enabling the use of fully nonparametric density estimation. The resulting models and methods are easy to compute and theoretically well supported.

    Selected Publications

    Local and Global Inference for High Dimensional Gaussian Copula Graphical Models

    Quanquan Gu, Yuan Cao, Yang Ning, Han Liu

    On the arXiv:1502.02347. 2015.

    On Semiparametric Exponential Family Graphical Models

    Zhuoran Yang, Yang Ning, and Han Liu

    On the arXiv:1412.8697. 2014.

    High Dimensional Semiparametric Gaussian Copula Graphical Models

    Han Liu, Fang Han, Ming Yuan, John Lafferty, and Larry Wasserman

    The Annals of Statistics, Volume 40, No. 40, pp2293-2326. 2012.

    Forest Density Estimation

    Han Liu, Min Xu, Haijie Gu, Anupam Dasgupta, John Lafferty, and Larry Wasserman

    Journal of Machine Learning Research (JMLR) Vol 12, 907−951. 2011.

    » Learn more

    Transellipitcal Modeling and Robust Inference

    The transelliptical model contains both finite- and infinite-dimensional parameters. It provides a unified approach to extend and robustify a large family of high dimesnional multiviarate methods, including sparse principal component analysis, sparse linear discriminant analysis, sparse covariance matrix estimation, and sparse graphical models. For inference, we develop a family of regularized rank based estimators which directly estimate the finite dimensional parameters, while treating the infinite-dimensional parameters as a nuisance. In particular, one can impose structural sparsity assumptions directly on the finite-dimensional parameters. Such an approach allows us to simultaneously gain statistical efficiency, modeling flexibility, and inferential robustness.

    Selected Publications

    QUADRO: A Supervised Dimension Reduction Method via Rayleigh Quotient Optimization

    Jianqing Fan, Tracy Ke, Han Liu, and Lucy Xia

    Tha Annals of Statistics, Volume 43 (4), pp1498-1534. 2015

    Scale-Invariant Sparse PCA on High Dimensional Meta-elliptical Data

    Fang Han and Han Liu

    Journal of American Statistical Association (Theory and Methodology), Volume 109(505), pp275-287. 2014.

    » Learn more

    Nonconvex Statistical Optimization

    Model based optimization is a new area that lies at the intersection of statistical learning, robust optimization, and stochastic optimization. We apply ideas from modern porbability and statistics to solve large and complex optimization problems. In particular, we develop rigorous theoretical framework to characterize the interaction between informational and computational complexity. Our current research include:

    Selected Publications

    Optimal Computational and Statistical Rates of Convergence for Sparse Nonconvex Learning Problems

    Zhaoran Wang, Han Liu, and Tong Zhang

    On the arXiv:1306.4960. 2013.

    Sparse Covariance Matrix Estimation with Eigenvalue Constraints

    Han Liu, Lie Wang, and Tuo Zhao

    Journal of Computational and Graphical Statistics (JCGS). To appear. 2013.

    Nonparanormal Graph Estimation via Smooth-projected Neighborhood Pursuit

    Tuo Zhao, Kathryn Roeder and Han Liu

    Neural Information Processing Systems (NIPS), 25, 2012.

    » Learn more

    Large-Scale Calibrated Inference

    Post Pic

    Many statistical learning problems can be formulated into a multitask setting in which we want to simultaneously solve many learning subproblems. Calibrated inference automatically adjusts the regularization for each individual task with respect to its noise level or problem design so that it is simultaneously tuning insensitive and achieves an improved finite sample performance.

    Selected Publications

    Multivariate Regression with Calibration

    Han Liu, Lie Wang, and Tuo Zhao

    On the arXiv:1305.2238. 2013.

    TIGER: A Tuning-Insensitive Approach for Optimal Graph Estimation

    Han Liu and Lie Wang

    On the arXiv:1209.2437. 2012.

    » Learn more

    Theoretical Foundations of High Dimensional Inference

    We develop fundamental theory of learning algorithms that exploit hidden structure to overcome the curse of dimensionality when analyzing massive amounts of high dimensional datasets. In paritcular, we focus on complex settings where dependent data and nonconvex formulations arise. These theoretical understandings provide new recipes for designing more effective learning algorithms.

    Selected Publications

    Optimal Rates of Convergence of Transelliptical Component Analysis

    Fang Han and Han Liu

    On the arXiv:1305.6916. 2013.

    Optimal Feature Selection in High-Dimensional Discriminant Analysis

    Mladen Kolar and Han Liu

    On the arXiv:1306.6557. 2013.

    Compressive Network Analysis

    Xiaoye Jiang, Yuan Yao, Han Liu, and Leonidas Guibas

    IEEE Transactions on Automatic Control. To appear. 2013.

    » Learn more

    Modern Scientific Applications

    Post Pic

    Our research has a wide variety of applications, raning from brain image analysis, genomics data analysis, and social media analysis. The data in these fields are usually very high dimensional and complex, which makes flexible nonparametric methods suitable for building accurate predictive models or discovering new scientific facts:

    Selected Publications

    Challenges of Big Data Analysis

    Jianqing Fan, Fang Han, and Han Liu

    On the arXiv:1308.1479. 2013.

    Statistical Analysis of Big Data on Pharmacogenomics

    Jianqing Fan and Han Liu

    Advanced Drug Delivery Reviews. To appear. 2013.

    Optimal Tests of Treatment Effects Using Sparse Linear Programming

    Michael Rosenblum, Han Liu, and En-Hsu Yen

    On the arXiv:1306.0964. 2013.

    Soft Null Hypotheses: A Case Study of Image Enhancement Detection in Brain Lesions

    Haochang Shou, Russell T. Shinohara, Han Liu, Daniel S. Reich, Ciprian M. Crainiceanu

    On the arXiv:1306.5524. 2013.

    » Learn more
    Page 1 of 2

    Reading Group

    We have biweekly reading group. The topics of this semeser include optimization in infinite dimensional space, homotopy algorithm, stochastic convex optimization, random matrix theory, and CUDA for GPU programming.

    Get In Touch

    Department of Operations Research and Financial Engineering
    Sherred Hall 224
    Princeton University, NJ 08544
    Phone: +609 258 1788