Jul 26, 2016 · 10:00 a.m.–11:00 a.m. · 245 Lewis Science Library
Looking for some help getting started? Can’t get your code to run?
We offer an open help session every Tuesday morning from 10:00-11:00 in 245 Lewis Science Library.
This is an opportunity to meet with research computing staff for one-on-one help with all things cluster related. Topics include, but are certainly not limited to:
• Getting started on the cluster(s)
• Navigating the file systems
• Understanding and troubleshooting error messages
• Transferring and storing data
• Installing and compiling software
• Writing SLURM submission scripts
• Improving performance
• Programming strategies
• And many more…
Think of this as CSES office hours - no appointment necessary. We are also available to meet outside of these hours; please email firstname.lastname@example.org to schedule an appointment.
*** Time and room change due to renovation of 346 Lewis (Vis Lab)
Jul 26, 2016 · 2:00 p.m.– 4:00 p.m. · Princeton Center for Theoretical Science, 407 Jadwin Hall
Data analysis is an integral part of most research projects in an academic environment. Ability to analyze data in a quick and efficient way is a prerequisite for successful research. In the modern fast-paced world of Big Data, the Apache Hadoop's MapReduce framework for batch processing has been outgrown by Apache Spark, which boasted speeds 10-100x faster than Hadoop and set the world record in large scale sorting.
This workshop requires a programming background and experience with Python at an *intermediate* level. All exercises will use a mix of PySpark and Spark SQL (parts of Apache Spark), but previous experience with Spark or distributed computing is not required. It is strongly recommended that registrants take Introduction to Python for Scientific Computing or Introduction to Programming Using Python (offered by PICSciE) to learn or refresh their Python knowledge.
This training will attempt to articulate the expected output of scientists performing data intensive research and then teach how to use Spark and Hadoop software stack to achieve these expectations. It will also cover a set of mini-case studies including Web Mining, Text Classification and Image Processing exercises that will teach students how to manipulate data sets using distributed processing with PySpark and SparkSQL. Upon completion of this course, participants will be capable of creating their own performance-maximized Spark applications.
Alexey Svyatkovskiy is a Big Data, Software and Programming Analyst with the Princeton Institute for Computational Science & Engineering (PICSciE). He holds a PhD in particle physics and has over 5 years of experience in large scale data analysis, physics analysis, and machine learning for the CMS experiment at the CERN Large Hadron Collider.
Please register online at the training website, www.princeton.edu/training or contact Andrea Rubinstein at email@example.com /258-1397.
• Big Data: the big picture
• Spark software stack, Hadoop
• Spark programming model: transformations and actions
• Core data structures: RDDs and DataFrames
• Migrating to Spark from Python
• Ingesting data into RDD or DataFrames
• Running Spark via Slurm on a cluster
• Spark library: Spark SQL
• Spark ML - Spark's machine learning library
• Tuning performance of Spark applications: caching, partitioning and broadcasting
• Building Spark applications with Scala API
Jul 28, 2016 · 2:00 p.m.– 3:00 p.m. · 245 Lewis Science Library
We offer an open, walk-in help session every Thursday afternoon from 2:00 - 3:00 pm in 346 Lewis Library. No appointment necessary.
For help at other times, please email firstname.lastname@example.org.
The Help Session is an opportunity to meet with research computing staff for one-on-one help with data visualization and programming. We can discuss visualization programs, techniques, and data formats as well as programming and cluster usage. In particular, how to effectively display your data.
If you are working with large amounts of data on the Princeton High Performance Computing environment you can learn about remote visualization from tigressdata.princeton.edu.
***Room change due to renovation of 346 Lewis (Vis Lab)