Skip over navigation

Program in Statistics and Machine Learning

Director

Kosuke Imai

Executive Committee

Jianqing Fan, Operations Research and Financial Engineering

Kosuke Imai, Politics

John D. Storey, Molecular Biology, Lewis-Sigler Institute for Integrative Genomics

Associated Faculty

Yacine Ait-Sahalia, Economics

Sanjeev Arora, Computer Science

Sebastian Bubeck, Operations Research and Financial Engineering

Mung Chiang, Electrical Engineering

Jonathan D. Cohen, Psychology, Princeton Neuroscience Institute

Paul W. Cuff, Electrical Engineering

David P. Dobkin, Computer Science

Bo E. Honore, Economics

Michal Kolesar, Woodrow Wilson School, Economics

Sanjeev R. Kulkarni, Electrical Engineering

Han Liu, Operations Research and Financial Engineering

Ulrich K. Mueller, Economics

H. Vincent Poor, Electrical Engineering

Peter J. Ramadge, Electrical Engineering

Marc Ratkovic, Politics

Matthew J. Salganik, Sociology

H. Sebastian Seung, Computer Science, Princeton Neuroscience Institute

Christopher A. Sims, Economics

Amit Singer, Mathematics, Applied and Computational Mathematics

Mona Singh, Computer Science, Lewis-Sigler Institute for Integrative Genomics

Michael A. Strauss, Astrophysical Sciences

Olga G. Troyanskaya, Computer Science, Lewis-Sigler Institute for Integrative Genomics

Ramon van Handel, Operations Research and Financial Engineering

Sergio Verdu, Electrical Engineering

Mark W. Watson, Woodrow Wilson School, Economics

Sits with Committee

Andrew Conway, Psychology

Germán Rodriguez, Population Reseach


Information and Departmental Plan of Study

The Program in Statistics and Machine Learning is offered by the Center for Statistics and Machine Learning. The program is designed for students, concentrating in any department, who have a strong interest in data analysis and its application across disciplines. Statistics and machine learning -- the academic disciplines centered around developing and understanding data analysis tools -- play an essential role in various scientific fields including biology, engineering, and the social sciences. This new field of "data science" is interdisciplinary, merging contributions from computer science and statistics, and addressing numerous applied problems. Examples of data analysis problems include analyzing massive quantities of text and images, modeling cell-biological processes, pricing financial assets, evaluating the efficacy of public policy programs, and forecasting election outcomes. In addition to its importance in scientific research and policy making, the study of data analysis comes with its own theoretical challenges, such as the development of methods and algorithms for making reliable inferences from high-dimensional and heterogeneous data. This program provides students with a set of tools required for addressing these emerging challenges. Through the program, students will learn basic theoretical frameworks and apply statistics and machine learning methods to many problems of interest.

Admission to the Program

Students are admitted to the program after they have chosen a concentration, generally by the beginning of their junior year. At that time, students must have prepared a tentative plan and timeline for completing all of the requirements of the program, including required courses and independent work (as outlined below), as well as any prerequisites for the selected courses. For enrollment or questions contact Tara Zigler, program manager.

Program of Study

Students are required to take a total of five courses and earn at least a B- for each course: one of the "Foundations of Statistics" courses, one of the "Foundations of Machine Learning" courses, and three elective courses. With all necessary permissions, advanced students may also take approved graduate-level courses. Students may count at most two courses from their departmental concentration or another certificate program toward this certificate program.

Students are also required to complete a thesis or at least one term of independent work in their junior or senior year on a topic that makes substantial application or study of machine learning or statistics. This work may be used to satisfy the requirements of both the program and the student's department of concentration. Submission is due on the same date as your department deadline for thesis or junior independent work. All work will be reviewed by the Statistics and Machine Learning certificate committee. At the end of academic each year, there will be a public poster session for students to present their work to each other, to other students, and to the faculty.

Finally, students are encouraged to attend one of the Statistics and Machine Learning colloquia on campus. These include the Wilks Statistics Seminar, the Machine Learning Seminar, the Political Methodology Seminar, and the Quantitative and Computational Biology Seminar.

Courses

One of the following courses (Foundations of Statistics)

ECO 202 Statistics and Data Analysis for Economics
EEB 355 Introduction to Statistics for Biology (also MOL 355)
ORF 245 Fundamentals of Engineering Statistics
POL 345 Quantitative Analysis and Politics
PSY 251 Quantitative Methods
WWS 200 Statistics for Social Science

One of the following courses (Foundations of Machine Learning)

COS 424 Interacting with Data
ORF 350 Analysis of Big Data

Three of the following courses (including those above, with permission)

Machine Learning
COS 401 Introduction to Machine Translation
COS 402 Artificial Intelligence
ELE 218 Learning Theory and Epistemology
ORF 418 Optimal Learning

Theory
MAT 385 Probability Theory
ORF 309 Probability and Stochastic Systems
ORF 473 Special Topics in Operations Research and Financial Engineering: Stochastic calculus

Applied Statistics
ECO 302 Econometrics
ECO 312 Econometrics: A Mathematical Approach
ECO 313 Econometric Applications
ELE 480 FMRI Decoding: Reading Minds Using Brain Scans
ELE 486 Transmission and Compression of Information
GEO 422 Data, Models, and Uncertainty in the Natural Sciences
MOL 436 Statistical Methods for Genomic Data
ORF 405 Regression and Time Series
POL 346 Applied Quantitative Analysis

Certificate of Proficiency

Students who fulfill the program requirements receive a certificate upon graduation.