Research Overview

Effective data analysis at the required scale will be one of the greatest challenges of modern information society. To handle this challenge, we need to understand complexiy--both informational and computational complexity. My research lies at the intersection of modern statistics and computer science. Especially, I am interested in statistical and computational foundations of Big Data analysis. My application interest is to develop a unified set of computational, statistical, and software tools to extract and interpret significant information from the data collected from a variety of modern sources (Social media, biological experiments, medical diagnostics, etc.).

Statistical Foundations of Big Data Analysis

Post Pic

We develop flexible statistical models and computational algorithms that exploit hidden structure to overcome the curse of dimensionality when analyzing massive amounts of high dimensional datasets. These algorithms have strong theoretical guarantees and provide new recipes for many important inference tasks, ranging from unsupervised exploratory data analysis to supervised predictive modeling. We now focus on:

  • Aim 1: Large-Scale Semiprametric Inference: nonparametric modeling with a parametric rate
  • Aim 2: Structured nonparametric models: sparse additive modeling and forest modeling
  • Aim 3: Tuning-insensitive learning and dependent data analysis
  • » Learn more

    Computational Foundations of Big Data Analysis

    Post Pic

    Statistical optimization is a new area that lies at the intersection of statistical learning, robust optimization, and stochastic optimization. We apply ideas from modern porbability and statistics to solve large and complex optimization problems. In particular, we develop rigorous theoretical framework to characterize the interaction between informational and computational complexity. Our current research include:

  • Aim 1: Informational and computational complexity tradeoff
  • Aim 2: Structured nonsmooth convex optimization and infinite-dimensional optimization
  • Aim 3: Statistical regularization and robust optimization
  • » Learn more

    Scientific Applications

    Post Pic

    Our research has a wide variety of applications, raning from genomics, bio-imaging, and social media analysis. The data in these fields are usually very high dimensional and complex, which makes flexible nonparametric methods suitable for building accurate predictive models or discovering new scientific facts:

  • Aim 1: Genomic data analysis and text mining
  • Aim 2: Statistical analysis of large-scale DCE-MRI and fMRI data
  • Aim 3: Longitudinal social netowrk and regulatory network analysis
  • » Learn more
    Page 1 of 2

    Reading Group

    We have biweekly reading group. The topics of this semeser include optimization in infinite dimensional space, homotopy algorithm, stochastic convex optimization, random matrix theory, and CUDA for GPU programming.

    Get In Touch

    Department of Operations Research and Financial Engineering
    Sherred Hall 224
    Princeton University, NJ 08544
    Phone: +609 258 1788
    Email: hanliu@princeton.edu