[Machine Learning]

Samory Kpotufe

E-mail: samory@princeton.edu
Office: Room 327, Operations Research and Financial Engineering, Sherrerd Hall, Princeton University.

I'm an Assistant Professor at ORFE, Princeton University. In a recent past I was Assistant Research Professor at Toyota Technological Institute Chicago. Prior to this I was a researcher at the Max Planck Institute for Intelligent Systems. At the MPI I worked in the department of Bernhard Schoelkopf, in the learning theory group of Ulrike von Luxburg. I did my PhD (Sept 2010) in Computer Science at the University of California, San Diego, advised by Sanjoy Dasgupta.


ORF 245/ENG 245 (Spring, 2014-2015, 2015-2016, 2016-2017). Fundamentals of Statistics. [Syllabus]

ORF 524 (Fall 2015-2016, 2015-2016). Statistical Theory and Methods. [Syllabus]

- NIPS 2016 Workshop on Nonparametrics just got approved. We're looking forward to an exciting array of speakers.
- 2016 Seed grant from Siebel Energy Institute, jointly with Prof. Nick Feamster, to work on ML challenges in Internet-Of-Things (smart home/cities). Initial work will consider anomalous activity detection.
- Lectured at Machine Learning Summer School (MLSS) 2016, Cadiz.
- Princeton Engineering Commendation List for Outstanding Teaching (ORF 524, Fall 2015-2016).
I work in machine learning, with an emphasis on nonparametric methods and high-dimensional statistics. Generally, I am interested in understanding the inherent difficulty of high-dimensional learning problems (e.g. most modern data mining problems). The nonparametric setting is attractive in that it captures settings where we have little domain knowledge, and thus allows for a degree of abstraction in dealing with difficult high-dimensional learning.

More specifically, I'd like to understand quantities/structures that characterize the complexity of high-dimensional problems (e.g. intrinsic dimension, sparsity, clusters, smoothness, and so forth), where complexity is stated in terms of the resources required to learn (e.g. amount of data, computation time). Using this understanding, the goal is to improve/derive procedures that work well under various modern constraints.
Examples of common preprocessing to high-dimensional problems are "dimension reduction", "dictionary learning" (for sparse representation). My research, along with many other interesting recent results, is uncovering high-dimensional situations where such preprocessing has little statistical benefit: certain learners (e.g. kNN, some classification trees) already perform as well as if they were operating in a lower-dimensional space, or operating on a sparse representation.
A longer term goal is to derive deployable adaptive procedures, i.e. practical procedures that can self-tune to unknown structure in data. Self-tuning is desired in settings with various constraints, e.g., time complexity constaints, space constraints with sequential data, pointwise optimality rather than average performance, etc. A more in depth discussion can be found in my research statement. Also, here is a talk I've given a few times on the subject of nonparametric regression in high-dimensional spaces, and adaptivity to important problem parameters.


Samory Kpotufe. Density Ratio Estimation with Data-driven Tuning: Optimal Rates.

Heinrich Jiang, Samory Kpotufe. Modal-set estimation with an application to clustering. [ pdf ]

Samory Kpotufe, Nakul Verma. Time-Accuracy Tradeoffs in Kernel Prediction: Controlling Prediction Quality. [ pdf ]


Samory Kpotufe, Abdeslam Boularias, Thomas Schultz, Kyoungok Kim. Gradients Weights improve Regression and Classification.
Journal Of Machine Learning Research (JMLR) 2016. [ pdf ]

Samory Kpotufe, Ruth Urner, Shai Ben-David. Hierarchical label queries with data-dependent partitions.
Conference on Learning Theory, 2015. [ pdf ]

Sanjoy Dasgupta, Samory Kpotufe. Optimal rates for k-NN density and mode estimation.
Neural Information Processing Systems (NIPS) 2014. [ pdf | slides (CIRM, Luminy)]

Kamalika Chaudhuri, Sanjoy Dasgupta, Samory Kpotufe, Ulrike von Luxburg. Consistent procedures for cluster-tree estimation and pruning.
IEEE Transactions on Information Theory, 60(12):7900-7912, 2014. [ pdf ]

Shubhendu Trivedi, Jialei Wang, Samory Kpotufe, Gregory Shakhnarovich. A Consistent Estimator of the Expected Gradient Outerproduct.
Uncertainty in Artificial Intelligence (UAI) 2014. [ pdf ]

Samory Kpotufe, Eleni Sgouritsa, Dominik Janzing, Bernhard Schoelkopf. Consistency of Causal Inference under the Additive Noise Model.
International Conference on Machine Learning (ICML) 2014. [ pdf ]

Samory Kpotufe, Vikas K. Garg. Adaptivity to Local Smoothness and Dimension in Kernel Regression.
Neural Information Processing Sytems (NIPS) 2013. [ pdf ]

Samory Kpotufe, Francesco Orabona. Regression-tree Tuning in a Streaming Setting.
Neural Information Processing Sytems (NIPS) 2013. Selected for Spotlight (one of 52/1420 submissions). [ pdf ]

Samory Kpotufe, Abdeslam Boularias. Gradient weights help nonparametric regressors.
Neural Information Processing Sytems (NIPS) 2012. Selected for Plenary Presentation (one of 20/1467 submissions). [ pdf ]

Samory Kpotufe. k-NN Regression adapts to local intrinsic dimension.
Neural Information Processing Sytems (NIPS) 2011. Selected for Plenary Presentation (one of 20/1400 submissions). [ pdf ]

Samory Kpotufe, Ulrike von Luxburg. Pruning nearest neighbor cluster trees.
International Conference on Machine Learning (ICML) 2011. [ pdf | slides ]

Samory Kpotufe, Sanjoy Dasgupta. A tree-based regressor that adapts to intrinsic dimension.
Invited to Special Issue of the Journal of Computer and Systems Sciences (JCSS) 2011. [ pdf ]

Samory Kpotufe. The curse of dimension in nonparametric regression.
UCSD, Phd Dissertation 2010. [ pdf ]

Eric Flynn, Samory Kpotufe, et al. SHMTools: a new embeddable software package for SHM applications. SPIE 2010.

Samory Kpotufe. Escaping the curse of dimensionality with a tree-based regressor.
Conference on Learning Theory (COLT) 2009. Mark Fulk Best Student Paper. [ pdf | slides ]

Nakul Verma, Samory Kpotufe, Sanjoy Dasgupta. Which spatial partition trees are adaptive to intrinsic dimension?
Uncertainty in Artificial Intelligence (UAI) 2009. [ pdf | poster ]

Samory Kpotufe. Fast, smooth and adaptive regression in metric spaces.
Neural Information Processing Sytems (NIPS) 2009. [ pdf ]

Some invited talks

CAL IT2, Information Theory and Applications Workshop. February 2014 + 2013.

Carnegie Mellon, Statistics. October 2013.

ETH (Swiss Federal Institute of Technology) Zurich, ML group. April 2013.

Weierstrass Institute for Applied Analysis and Stochastics. November 2011.

Foundations of Computational Mathematics, Learning Theory Workshop. June 2011.


Professional activities
Senior Committees:

- Editorial Board Member: Journal of Machine Learning Research (2014 to present).
- Area Chair: COLT (2015, 2016), NIPS (2015, 2016), AISTATS (2017).


Journal of Machine Learning Research, IEEE Transactions on Information Theory, IEEE Transactions on Pattern Analysis and Machine Intelligence, Annals of Statistics, ESAIM Probability and Statistics, Neural Information Processing Systems (NIPS), ACM-SIAM Symposium On Discrete Algorithms (SODA), International Conference on Machine Learning (ICML), ...


Biking, basketball, I also like to draw and paint.