Advanced Data Analysis Techniques with Apache Spark, 3/21/17
Data analysis is an integral part of most research projects in an academic environment. Ability to analyze data in a quick and efficient way is a prerequisite for successful research. In the modern fast-paced world of Big Data, the Apache Hadoop's MapReduce framework for batch processing has been superseded by Apache Spark, which showed speeds up to 100x faster than Hadoop for iterative algorithms, and set the world record in large scale sorting.
This mini course requires a programming background and experience with Python at an *intermediate* level, and basic understanding of Scala programming language is desirable. All exercises will use a mix of PySpark, Spark SQL and Spark ML (parts of Apache Spark), but previous experience with Spark or distributed computing is not required. It is strongly recommended that registrants take Introduction to Programming Using Python (offered by PICSciE) to learn or refresh their Python knowledge.
This training will attempt to teach how to use Spark on high-performance computing (HPC) clusters. It will also cover a set of mini-case studies including Web Mining, Natural Language Processing and Image Processing exercises that will teach students how to manipulate data sets using distributed processing with Spark. Upon completion of this course, participants will be capable of creating their own performance-maximized Spark applications.
Alexey Svyatkovskiy is a Big Data, Software and Programming Analyst with the Princeton Institute for Computational Science & Engineering (PICSciE). He holds a PhD in particle physics and has over 5 years of experience in large scale data analysis and machine learning. His work has been presented at IEEE Big Data conference and Spark Summit.
Please register online at the training website, www.princeton.edu/training or contact Andrea Rubinstein at firstname.lastname@example.org /258-1397.
Location: Room 347, Visualization Lab
Date/Time: 03/21/17 at 11:00 am - 03/21/17 at 4:30 pm