ELE522: LargeScale Optimization for Data Science
The term project can either be a literature review or include original research:
Literature review. We will provide a list of related papers not covered in the lectures, and the literature review should involve indepth summaries and exposition of one of these papers.
Original research. It can be either theoretic or experimental (ideally a mix of the two), with approval from the instructor. If you choose this option, you can do it either individually or in groups of two. You are encouraged to combine your current research with your term project.
There are 2 milestones / deliverables to help you through the process.
Proposal (due Nov. 6). Submit a short report (no more than 1 page) stating the papers you plan to survey or the research problems that you plan to work on. Describe why they are important or interesting, and provide some appropriate references. If you elect to do original research, please do not propose an overly ambitious project that cannot be completed by the end of the semester, and do not be too lured by generality. Focus on the simplest scenarios that can capture the issues youâ€™d like to address.
A written report (due Jan. 13). You are expected to submit a final project report – up to 5 pages with unlimited appendixâ€”summarizing your findings. You must email an electronic copy to both the TA and me.
A few suggested (theoretical) papers for literature review (to be updated)
‘‘On Gradient Descent Ascent for NonconvexConcave Minimax Problems,’’ T. Lin, J. Chi, and M. Jordan, 2019.
‘‘Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent,’’ A. Dalalyan, Conference on Learning Theory, 2017.
‘‘Variancereduced Qlearning is minimax optimal,’’ M. Wainwright, 2019.
‘‘Stochastic approximation with conecontractive operators: Sharp Linfbounds for Qlearning,’’ M. Wainwright, 2019.
‘‘Sharp analysis for nonconvex sgd escaping from saddle points,’’ C. Fang, Z. Lin, and T. Zhang, 2019.
‘‘Spider: Nearoptimal nonconvex optimization via stochastic pathintegrated differential estimator,’’ C. Fang, C. Li, Z. Lin, T. Zhang, 2018.
‘‘PlugandPlay Methods Provably Converge with Properly Trained Denoisers,’’ E. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, W. Yin, 2019.
‘‘Theoretical linear convergence of unfolded ISTA and its practical weights and thresholds,’’ X. Chen, J. Liu, Z. Wang, W. Yin, 2018.
‘‘What Can ResNet Learn Efficiently, Going Beyond Kernels?’’ Z. AllenZhu, Y. Li, 2019.
‘‘Can SGD Learn Recurrent Neural Networks with Provable Generalization?’’ Z. AllenZhu, Y. Li, 2019.
‘‘Lowrank matrix recovery with composite optimization: good conditioning and rapid convergence,’’ V. Charisopoulos, Y. Chen, D. Davis, M. Diaz, L. Ding, D. Drusvyatskiy, 2019.
‘‘Stochastic methods for composite and weakly convex optimization problems,’’ J. Duchi, R. Feng, SIAM Journal on Optimization, 2018.
‘‘EXTRA: An Exact FirstOrder Algorithm for Decentralized Consensus Optimization,’’ W. Shi, Q. Ling, G. Wu, W. Yin, 2014.
‘‘Convergence Analysis of Alternating Direction Method of Multipliers for a Family of Nonconvex Problems, ’’ M. Hong, Z.Q. Luo, and M. Razaviyayn, 2016
‘‘How to Escape Saddle Points Efficiently, ’’ C. Jin, R. Ge, P. Netrapalli, S. Kakade, M. Jordan, 2017
‘‘Natasha 2: Faster NonConvex Optimization Than SGD, ’’ Z. AllenZhu, 2017
‘‘A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights, ’’ W. Su, S. Boyd and E. J. Candes, 2015
‘‘A Geometric Alternative to Nesterov's Accelerated Gradient Descent, ’’ S. Bubeck, Y. T. Lee, M Singh, 2015
‘‘Faster Rates for the FrankWolfe Method over StronglyConvex Sets, ’’ D. Garber, E. Hazan, 2014
‘‘The landscape of empirical risk for nonconvex losses,’’ S. Mei, Y. Bai, and A. Montanari, 2016.
‘‘Regularized Mestimators with nonconvexity: Statistical and algorithmic theory for local optima, ’’ P. Loh and M. Wainwright, 2013.
‘‘Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion and Blind Deconvolution, ’’ C. Ma, K. Wang, Y. Chi, and Y. Chen, 2017
‘‘Adaptive restart for accelerated gradient schemes, ’’ B. O'Donoghue and E. Candes, 2015
‘‘Gradient PrimalDual Algorithm Converges to SecondOrder Stationary Solutions for Nonconvex Distributed Optimization, ’’ M. Hong, J. Lee, M Razaviyayn, 2018
‘‘Mirror descent in nonconvex stochastic programming, ’’ Z. Zhou, P. Mertikopoulos, N. Bambos, S. Boyd, P. Glynn, 2017
‘‘Gradient Descent Can Take Exponential Time to Escape Saddle Points, ’’ S. Du, C. Jin, J. Lee, M. Jordan, B. Poczos, A. Singh, 2017
‘‘Accelerating Stochastic Gradient Descent, ’’ P. Jain, S. Kakade, R. Kidambi, P. Netrapalli, A. Sidford, 2017
‘‘Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming, ’’ S. Ghadimi, G. Lan, 2013
‘‘On the Connection Between Learning TwoLayers Neural Networks and Tensor Decomposition, ’’ M. Mondelli, A. Montanari, 2018
‘‘Learning Onehiddenlayer Neural Networks with Landscape Design, ’’ R. Ge, J. Lee, T. Ma, 2017
‘‘Gradient Descent Learns Linear Dynamical Systems, ’’ M. Hardt, T. Ma, B. Recht, 2016
‘‘Memoryefficient Kernel PCA via Partial Matrix Sampling and Nonconvex Optimization: a Modelfree Analysis of Local Minima, ’’ J. Chen, X. Li, 2017
‘‘On the Sublinear Convergence of Randomly Perturbed Alternating Gradient Descent to Second Order Stationary Solutions, ’’ S. Lu, M. Hong, Z. Wang, 2018
‘‘On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization, ’’ S. Arora, N. Cohen, E. Hazan, 2018
‘‘Characterizing Implicit Bias in Terms of Optimization Geometry, ’’ S. Gunasekar, J. Lee, D. Soudry, N. Srebro, 2018
‘‘An Alternative View: When Does SGD Escape Local Minima? ’’ R. Kleinberg, Y. Li, Y. Yuan, 2018
‘‘Stochastic Cubic Regularization for Fast Nonconvex Optimization, ’’ N. Tripuraneni, M. Stern, C. Jin, J. Regier, M. Jordan
You have the freedom to select a paper of your own interest (especially more practical papers), upon the instructor's approval.
