Statistical Machine Learning Lab

Princeton University

Publications

Acknowledgement. The research results presented on this page are supported by the grants NSF IIS 1546482-BIGDATA, NIH R01MH102339, NSF IIS1408910, NSF IIS1332109, NIH R01GM083084, NIH R01HG06841.

Selected Papers
  •        Top 5 Selected Most Representative Papers
  • Journal Papers
  •        Area 1: Post-Regularization Inference
  •        Area 2: Model-based Statistical Optimization
  •        Area 3: Nonparametric Functional Sparsity
  •        Area 4: Graphical Model Inference
  •        Area 5: Statistical Computing
  •        Area 6: Modern Scientific Applications
  •        Invited Discussion Articles
  • Conference Papers
  •        Proceeding Papers in Machine Learning Conferences
  • Top 5 Selected Papers

    Here I list 5 selected papers and preprints which best represent my recent research results..

    Selected Paper No.1:

    Combinatorial Inference for Graphical Models

    Matey Neykov, Junwei Lu and Han Liu

    The Annals of Statistics , Under review. 2016

    Blurb. This paper proposes a new family of combinatorial inference problems which aim at testing the global structural properties of high dimenisonal graphical models. Our main contribution is to develop a unified theory to characterize the fundamental limits and efficient algorithms for a large family of combinatorial inference problems. More details on combinatoiral inference can be found in my talk slides here.

    Selected Paper No.2:

    Sharp Computational-Statistical Phase Transitions via Oracle Computational Model

    Zhaoran Wang, Quanquan Gu and Han Liu

    The Annals of Statistics, Under revision. 2015.

    Blurb. This paper proposes a unified theory to characterize the information-theoretic limits of statistical problems under a computational budget (aka. computational lower bound). Unlike previous appraoches which use Turing machine to characterize computation, we exploit the oracle computation model (or statistical query model), which is more amenable to theoretical analysis. More details on computational lower bound can be found here.

    Selected Paper No.3:

    A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models

    Yang Ning and Han Liu

    The Annals of Statistics. Accepted. 2014.

    Blurb. This paper considers both hypothesis tests and confidence regions for generic penalized M-estimators. Unlike most existing inferential methods which are tailored for individual models, our approach provides a general framework for high dimensional sparse inference (also called post-regularization inference) and is applicable to a wide range of applications. More details on such post-regularization inference can be found here.

    Selected Paper No.4:

    Pathwise Coordinate Optimization for Sparse Learning: Algorithm and Theory

    Tuo Zhao, Han Liu, and Tong Zhang

    The Annals of Statistics, To appear. 2014

    Blurb. This paper develops a model-based statistical optimization theory to analyze the pathwise coordinate optimization algorithms.We solve a long-lasting open problem on providing a rigorous theory to justify the superior performance of the pathwise coordinate optimization strategies for both convex and nonconvex sparse learning problems. More details on model-based statistical optimization can be found here.

    Selected Paper No.5:

    High Dimensional Semiparametric Gaussian Copula Graphical Models

    Han Liu, Fang Han, Ming Yuan, John Lafferty, and Larry Wasserman

    The Annals of Statistics, Vol40 (40), pp2293-2326. 2012.

    Blurb. This paper proposes a regularized rank-based estimator (e.g., based on the Kendall's tau correlation coefficients) to fit the nonGaussian graphical model. We prove that the proposed procedure simultaneously achieves the optimal parametric rates of convergence for both graph recovery and parameter estimation. More details on nonparametric and semiparametric graphical models can be found here.

    » Back to the top

    Area 1: Post-Regularization Inference

    Post-regularization inference provides a unified and efficient framework for uncertainty assessment of high dimensional, sparsely regularized estimators.

    Distributed Estimation and Inference with Statistical Guarantees

    with H Battey, J Fan, J Lu and Z Zhu

    The Annals of Statistics. To appear, 2016.

    A Likelihood Ratio Framework for High Dimensional Semiparametric Regression

    with Y Ning and T Zhao

    The Annals of Statistics. Accepted, 2016.

    A Partially Linear Framework for Massive Heterogeneous Data

    with T Zhao, G Cheng

    The Annals of Statistics. Volume 44(4), pp1400-1437, 2016.

    Testing and Confidence Intervals for High Dimensional Proportional Hazards Model

    with E X Fang and Y Ning

    Journal of the Royal Statistical Society: Series B. Accepted, 2016.

    » Back to the top

    Area 2: Model-based Statistical Optimization

    Statistical optimization applies model-based statistical thinking to develop new methods and theory for solving large-scale convex and even nonconvex optimization problems.

    Pathwise Coordinate Optimization for Sparse Learning: Algorithm and Theory

    with T Zhao and T Zhang

    The Annals of Statistics. To appear, 2016.

    iLAMM for Sparse Learning: Simultaneous Control of Algorithmic Complexity and Statistical Error

    with J Fan, Q Sun and T Zhang

    The Annals of Statistics. To appear, 2016.

    Optimal Computational and Statistical Rates of Convergence for Sparse Nonconvex Learning Problems

    with Z Wang and T Zhang

    The Annals of Statistics. Volume 42(6), pp2164-2201, 2014.

    Provable Sparse Tensor Decomposition

    with W Sun, J Lu, and G Cheng

    Journal of the Royal Statistical Society: Series B. Accepted, 2016.

    Stochastic Compositional Gradient Descent: Algorithms for Minimizing Compositions of Expected-Value Functions

    with M Wang and E Fang

    Mathematical Programming: Series A. Accepted, 2016.

    A Strictly Contractive Peaceman-Rachford Splitting Method for Convex Program

    with B He, Z Wang and X Yuan

    SIAM Journal on Optimization. Volume 24(3), pp1011-1040, 2014.

    Optimal Tests of Treatment Effects for the Overall Population and Two Subpopulations in Randomized Trials, using Sparse Linear Programming

    with M Rosenblum, E Yen

    Journal of American Statistical Association. Volume 109(507), pp1216-1228, 2014.

    » Back to the top

    Area 3: Nonparametric Functional Sparsity

    Nonparametric Functional sparsity studies systematic methods for imposing interpretable structural regularizations on nonparametric models to attain dentifiability and inferential tractability.

    QUADRO: A Supervised Dimension Reduction Method via Rayleigh Quotient

    with J Fan, Q Sun and T Zhang

    The Annals of Statistics. Volume 43(4), pp1498-1534, 2015.

    ECA: High Dimensional Elliptical Component Analysis in nonGaussian Distributions

    with F Han

    Journal of American Statistical Association. Accpeted, 2016.

    Mining Massive Amounts of Genomic Data: A Semiparametric Topic Modeling Approach

    with E X Fang, M Li, and M I Jordan

    Journal of American Statistical Association. Accepted, 2016.

    Scale-Invariant Sparse PCA on High Dimensional Meta-elliptical Data

    with F Han

    Journal of American Statistical Association. Volume 109(505), pp275-287, 2014.

    Sparse Additive Models

    with P Ravikumar, J Lafferty and L Wasserman

    Journal of the Royal Statistical Society: Series B. Volume 71(5), pp1009-1030, 2009.

    Optimal Feature Selection in High-Dimensional Discriminant Analysis

    with M Koar

    IEEE Transactions on Information Theory. Volume 61(2), pp1063-1083, 2015.

    Calibrated Precision Matrix Estimation for High Dimensional Elliptical Distributions

    with T Zhuo

    IEEE Transactions on Information Theory. Volume 60(12), pp7874-7887, 2014.

    » Back to the top

    Area 4: Graphical Model Inference

    Graphical model inference deals with estimation and uncertainty assessment of the topological structure of the graphs under graphical models.

    High Dimensional Semiparametric Gaussian Copula Graphical Models

    with F Han, M Yuan, J Lafferty and L Wasserman

    The Annals of Statistics. Volume 40(40), pp2293-2326, 2012.

    High Dimensional Semiparametric Latent Graphical Model for Mixed Data

    with J Fan, Y Ning, and H Zou

    Journal of the Royal Statistical Society, Series B. Accpeted, 2016.

    Joint Estimation of Multiple Graphical Models from High Dimensional Time Series

    with H Qiu, F Han and B Caffo

    Journal of the Royal Statistical Society, Series B. Accepted, 2016.

    Replicates in high dimensions, with applications to latent variable graphical models

    with K. M. Tan, Y Ning, and D Witten

    Biometrika. Accepted, 2016.

    High-dimensional semiparametric bigraphical models

    with Y Ning

    Biometrika. Volume 100(3), pp655-670, 2013.

    TIGER: A Tuning-Insensitive Approach for Optimally Estimating Gaussian Graphical Models

    with L Wang

    Electronic Journal of Statistics. Accepted, 2016.

    Nonparametric Latent Tree Graphical Models: Inference, Estimation, and Learning

    with L Song, A Parikh and E. P. Xing

    Journal of Machine Learning Research. Accepted, 2016.

    A Direct Estimation of High Dimensional Stationary Vector Autogressions

    with F Han and H Lu

    Journal of Machine Learning Research. Volume 16, pp3115−3150, 2015.

    The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation

    with X Li, T Zhao and X Yuan

    Journal of Machine Learning Research. Volume 16, pp553-557, 2015.

    Graph Estimation From Multi-attribute Data

    with M Kolar and E.P. Xing

    Journal of Machine Learning Research. Volume 15, pp1713-1750, 2014.

    The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation

    with H Pang and R Vanderbei

    Journal of Machine Learning Research. Volume 15, pp489-493, 2014.

    The huge Package for High-dimensional Undirected Graph Estimation in R

    with T Zhao, K Roeder, J Lafferty and L Wasserman

    Journal of Machine Learning Research. Volume 13, pp1059-1062, 2012.

    Forest Density Estimation

    with M Xu, H Gu, A Dasgupta, J Lafferty, L Wasserman

    Journal of Machine Learning Research. Volume 12, pp907-951, 2011.

    The Nonparanormal: Semiparametric Estimation of High-dimensional Undirected Graphical Models

    with J Lafferty and L Wasserman

    Journal of Machine Learning Research. Volume 10, pp2157-2192, 2009.

    Sparse Nonparametric Graphical Models

    with J Lafferty and L Wasserman

    Statistical Science. Volume 27(4), pp519-537, 2012.

    » Back to the top

    Area 5: Statistical Computing

    Statistical computing aims to develop efficient and effective numerical algorithms to fit statistical models to large and complex datasets.

    Optimization for Compressed Sensing: the Simplex Method and Kronecker Sparsification

    with R Vanderbei, L Wang and K Lin

    Mathematical Programming Computation. Accepted, 2016.

    Generalized Alternating Direction Method of Multipliers: New Theoretical Insight and Application

    with E. X. Fang, B. He and X Yuan

    Mathematical Programming Computation. Volume 7(2), pp149-187, 2015.

    Accelerated Path-following Iterative Shrinkage Thresholding Algorithm

    with T Zhao

    Journal of Computational and Graphical Statistics. Volume 25(4), pp1272-1296, 2016.

    Soft Null Hypotheses: A Case Study of Image Enhancement Detection in Brain Lesions

    wwith H Shou, R. T. Shinohara, D. S. Reich, and C. M. Crainiceanu

    Journal of Computational and Graphical Statistics. Volume 25(2), pp570-588, 2016.

    Sparse Covariance Estimation with Eigenvalue Constraints

    with L Wang, T Zhao

    Journal of Computational and Graphical Statistics. Volume 23(2), pp439-459, 2014.

    Positive Semidefinite Rank-based Correlation Matrix Estimation with Application to Semiparametric Graph Estimation

    with T Zhao and K Roeder

    Journal of Computational and Graphical Statistics. Volume 23(4), pp895-922, 2014.

    » Back to the top

    Area 6: Modern Scientific Applications

    My applied research interests include brain imaging data analysis, genomics, cognitive neuroscience, social network analysis and financial econometrics.

    Patterns and Rates of Exonic de novo Mutations in Autism Spectrum Disorders

    with B. M. Neale et al.

    Nature. Volume 485, pp242-245, 2012.

    Robust Inference of Risks of Large Portfolios

    with J Fan, F Han, and B Vicker

    Journal of Econometrics. Accepted, 2016.

    An Overview on the Estimation of Large Covariance and Precision Matrices

    with J Fan and Y Liao

    The Econometrics Journal. Volume 19(1), pp1-32, 2016.

    A Semiparametric Graphical Modelling Approach for Large-Scale Equity Selection

    with J Mulvey and T Zhao

    Quantitative Finance. Volume 16(7), pp1053-1067, 2016.

    Identifying Economic Regimes: Reducing Downside Risks for University Endowments and Foundations

    with J Mulvey

    Journal of Portfolio Management. Accepted, 2016.

    Sparse Median Graphs Estimation in a High Dimensional Semiparametric Model

    with F Han, X Han, and B Caffo

    The Annals of Applied Statistics. Accpeted, 2016.

    Calibrated Multivariate Regression with Application to Neural Basis Discovery

    with L Wang and T Zhao

    Journal of Machine Learning Research. Volume 16, pp1579-1606, 2015.

    High Dimensional Semiparametric Scale-invariant PCA

    with F Han

    IEEE Transactions on Pattern Analysis and Machine Intelligence. Volume 36(10), pp2016-2032, 2015.

    glmgraph: An R Package for Variable Selection and Predictive Modeling of Structured Genomic Data

    with L Chen, J. A. Kocher, H Li, J Chen

    Bioinformatics. Volume 31(24), pp3991-3993, 2015.

    Compressive Network Analysis

    with X Jiang, Y Yao and L Guibas

    IEEE Transactions on Automatic Control. Volume 59(11), pp2946-2961, 2014.

    Challenges of Big Data Analysis

    with J Fan and F Han

    National Science Review. Volume 2(1), pp1-24, 2014.

    CODA: High Dimensional Copula Discriminant Analysis

    with F Han and T Zhao

    Journal of Machine Learning Research. Volume 14, pp629-671, 2013.

    Statistical Analysis of Big Data on Pharmacogenomics

    with J Fan

    Advanced Drug Delivery Reviews. Volume 65(7), pp987-1000, 2013.

    An Efficient Optimization Algorithm for Structured Sparse CCA, with Applications to eQTL Mapping

    with X Chen

    Statistics in Biosciences. Volume 4(1), pp3-26, 2012.

    Automated Diagnoses of Attention Deficit Hyperactive Disorder using Magnetic Resonance Imaging

    with AD Barber, B Caffo, A Eloyan, F Han, S Joel, SH Mostofsky, J Muschelli, MB Nebel, JJ Pekar, T Zhao

    Frontiers in Systems Neuroscience. Volume 6(61), pp1-9, 2012.

    » Back to the top

    Invited Discussion Articles

    On ‘New statistics for old—measuring the wellbeing of UK’

    Journal of the Royal Statistical Society: Series A. 2016.

    On ‘Perils and potentials of self-selected entry to epidemiological studies and surveys’

    with Y Ning

    Journal of the Royal Statistical Society: Series A. 2015.

    On ‘Statistical Modelling of Citation Exchange Among Statistics Journals’

    Journal of the Royal Statistical Society: Series A. 2015.

    On ‘Sequential Quasi-Monte-Carlo Sampling’

    with Y Zeng

    Journal of the Royal Statistical Society: Series B. 2015.

    On ‘Multiscale Change-Point Inference’

    Journal of the Royal Statistical Society: Series B. 2014.

    On ‘Large Covariance Estimation by Thresholding Principal Orthogonal Complements’

    with L Wang

    Journal of the Royal Statistical Society: Series B. 2014.

    On ‘Analysis of Forensic DNA Mixtures with Artefacts’

    with J Lu

    Journal of the Royal Statistical Society: Series C. 2014.

    » Back to the top

    Proceeding Papers in Machine Learning Conferences

    Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes

    with C.J. Li and Z Wang

    Neural Information Processing Systems (NIPS). 2016.

    More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning

    with X Yi, Z Wang, Z Yang, C Caramanis

    Neural Information Processing Systems (NIPS). 2016.

    Agnostic Estimation for Misspecified Phase Retrieval

    with M Neykov, Z Wang

    Neural Information Processing Systems (NIPS). 2016.

    Blind Attacks on Machine Learners

    with A. J. Beatson, Z Wang

    Neural Information Processing Systems (NIPS). 2016.

    Sparse Nonlinear Regression: Parameter Estimation and Asymptotic Inference

    with Z Yang, Z Wang, Y. C. Eldar, T Zhang

    International Conference on Machine Learning (ICML). 2016.

    On the Statistical Limits of Convex Relaxations: A Case Study

    with Z Wang and Q Gu

    International Conference on Machine Learning (ICML). 2016.

    Stochastic Variance Reduced Optimization for Nonconvex Sparse Learning

    with X Li, T Zhao, R Arora and J Haupt

    International Conference on Machine Learning (ICML). 2016.

    Low-Rank and Sparse Structure Pursuit via Alternating Minimization

    with Q Gu, Z Wang

    International Conference on Artificial Intelligence and Statistics (AISTATS). 2016.

    An Improved Convergence Analysis of Cyclic Block Coordinate Gradient Descent Methods for Strongly Convex Minimization

    with T Zhao, X Li, R Arora, M Hong

    International Conference on Artificial Intelligence and Statistics (AISTATS). 2016.

    A Lasso-based Sparse Knowledge Gradient Policy for Sequential Optimal Learning

    with Y Li and W Powell

    International Conference on Artificial Intelligence and Statistics (AISTATS). 2016.

    A Truth Discovery Approach with Theoretical Guarantee

    with H Xiao, J Gao, Z Wang, S Wang, and L Su

    The 22nd ACM SIGKDD on Knowledge Discovery and Data Mining (KDD). 2016.

    Robust Estimation of Transition Matrices in High Dimensional Heavy-tailed Vector Autoregressive Processes

    with H Qiu, F Han and B Caffo

    International Conference on Machine Learning (ICML). 2015.

    Robust Portfolio Optimization

    with H Qiu, F Han and B Caffo

    Neural Information Processing Systems (NIPS). 2015.

    A Nonconvex Optimization Framework for Low Rank Matrix Estimation

    wwith T Zhao and Z Wang

    Neural Information Processing Systems (NIPS). 2015.

    Nonconvex Statistical Optimization for Sparse Tensor Graphical Model

    with W Sun, Z Wang and G Cheng

    Neural Information Processing Systems (NIPS). 2015.

    Optimal Linear Estimation under Unknown Nonlinear Transform

    with X Yi, Z Wang and C Caramanis

    Neural Information Processing Systems (NIPS). 2015.

    Local Smoothness in Variance Reduced Optimization

    with D Vainsencher and T Zhang

    Neural Information Processing Systems (NIPS). 2015.

    High Dimensional EM Algorithm: Statistical Optimization and Asymptotic Normality

    with Z Wang, Q Gu and Y Ning

    Neural Information Processing Systems (NIPS). 2015.

    Multivariate Regression with Calibration

    with L Wang and T Zhao

    Neural Information Processing Systems (NIPS). 2014.

    Mode Estimation for High Dimensional Discrete Tree Graphical Models

    with C Chen, T Zhao and D Metaxas

    Neural Information Processing Systems (NIPS). 2014.

    Oracle Sparse PCA and Its Inference

    with Q Gu and Z Wang

    Neural Information Processing Systems (NIPS). 2014.

    Accelerated Mini-batch Randomized Block Coordinate Descent Method

    with T Zhao, Y Wang and R Arora

    Neural Information Processing Systems (NIPS). 2014.

    Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time

    with Z Wang

    Neural Information Processing Systems (NIPS). 2014.

    Robust Sparse Principal Component Regression

    with F Han

    Neural Information Processing Systems (NIPS). 2013.

    Sparse Inverse Covariance Estimation with Calibration

    with T Zhao

    Neural Information Processing Systems (NIPS). 2013.

    Transition Matrix Estimation in High Dimensional Vector Autoregressive Models

    with F Han

    International Conference on Machine Learning (ICML). 2013.

    Feature Selection in High-Dimensional Classification

    with M Kolar

    International Conference on Machine Learning (ICML). 2013.

    PCA on non-Gaussian Dependent Data

    with F Han

    International Conference on Machine Learning (ICML). 2013.

    Markov Network Estimation from Multi-attribute Data

    with M Kolar and E.P. Xing

    International Conference on Machine Learning (ICML). 2013.

    Sparse Principal Component Analysis for High Dimensional Multivariate Time Series

    with Z Wang and F Han

    International Conference on Artificial Intelligence and Statistics (AISTATS). 2013.

    High Dimensional Semiparametric Scale-invariant PCA

    with F Han

    Neural Information Processing Systems (NIPS). 2012.

    Exponential Concentration for Mutual Information Estimation, with Application to Forest Graphical Models

    with J Lafferty and L Wasserman

    Neural Information Processing Systems (NIPS). 2012.

    Transelliptical Graphical Models

    with F Han

    Neural Information Processing Systems (NIPS). 2012.

    Transelliptical Component Analysis

    with F Han

    Neural Information Processing Systems (NIPS). 2012.

    High-dimensional Nonparanormal Graph Estimation via Smooth-projected Neighborhood Pursuit

    with T Zhao and K Roeder

    Neural Information Processing Systems (NIPS). 2012.

    The Nonparanormal SKEPTIC

    with F Han, J Lafferty and L Wasserman

    International Conference on Machine Learning (ICML). 2012.

    Sparse Additive Machine

    with T Zhao

    International Conference on Artificial Intelligence and Statistics (AISTATS). 2012.

    Structured Sparse Canonical Correlation Analysis

    with X Chen

    International Conference on Artificial Intelligence and Statistics (AISTATS). 2012.

    Detecting Network Cliques using Radon Basis Pursuit

    with X Jiang, Y Yao, L Guibas

    International Conference on Artificial Intelligence and Statistics (AISTATS). 2012.

    Marginal Regression For Multitask Learning

    with M Kolar

    International Conference on Artificial Intelligence and Statistics (AISTATS). 2012.

    Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

    with K Roeder and L Wasserman

    Neural Information Processing Systems (NIPS). 2010.

    Graph-Valued Regression

    with X Chen, J Lafferty and L Wasserman

    Neural Information Processing Systems (NIPS). 2010.

    Multivariate Dyadic Regression Trees for Sparse Learning Problems

    with X Chen

    Neural Information Processing Systems (NIPS). 2010.

    The Group Dantzig Selector

    with J Zhang, J Liu, X Jiang

    International Conference on Artificial Intelligence and Statistics (AISTATS). 2010.

    Learning Spatial-Temporal Varying Graphs with Applications to Climate Data Analysis

    with X Chen, Y Liu, J Carbonell

    The 24th AAAI Conference on Artificial Intelligence (AAAI). 2010.

    Nonparametric Greedy Algorithms for the Sparse Learning Problem

    with X Chen

    Neural Information Processing Systems (NIPS). 2009.

    On the Estimation Consistency of the Group Lasso and its Applications

    with J Zhang

    International Conference on Artificial Intelligence and Statistics (AISTATS). 2009.

    Blockwise Coordinate Descent Procedures for the Multi-task Lasso, with Applications to Neural Semantic Basis Discovery

    with M Palatucci and J Zhang

    International Conference on Machine Learning (ICML). 2009.

    Nonparametric Regression and Classification with Joint Sparsity Constraints

    with J Lafferty and L Wasserman

    Neural Information Processing Systems (NIPS). 2008.

    SpAM: Sparse Additive Models

    with P Ravikumar, J Lafferty and L Wasserman

    Neural Information Processing Systems (NIPS). 2007.

    Sparse Nonparametric Density Estimation in High Dimensions using the Rodeo

    with J Lafferty and L Wasserman

    International Conference on Artificial Intelligence and Statistics (AISTATS). 2007.

    Towards the Prediction of Protein Abundance from Tandem Mass Spectrometry Data

    with A. J. Bonner

    SIAM International Conference on Data Mining (SDM). 2006.

    » Back to the top

    Reading Group

    We have biweekly reading group. The topics of this semeser include optimization in infinite dimensional space, homotopy algorithm, stochastic convex optimization, random matrix theory, and CUDA for GPU programming.

    Get In Touch

    Department of Operations Research and Financial Engineering
    Sherred Hall 224
    Princeton University, NJ 08544
    Phone: +609 258 1788
    Email: hanliu@princeton.edu