PrincetonPy Group Talks: Introduction to Pandas - From csv to Plot - Easy ways to do your data mining
May 4, 2016 · 4:00 p.m.– 5:00 p.m. · Room 346, Visualization Lab
This session aims to present the basics of Pandas, a Python library that helps you handling files into data frames. With this session you should be able to get started with Pandas, and get rid of your Excel Spreadsheets.
Come with your own .csv or excel files.
May 5, 2016 · 2:00 p.m.– 3:00 p.m. · Room 346, Visualization Lab
We offer an open, walk-in help session every Thursday afternoon from 2:00 - 3:00 pm in 346 Lewis Library. No appointment necessary.
For help at other times, please email firstname.lastname@example.org.
The Help Session is an opportunity to meet with research computing staff for one-on-one help with data visualization and programming. We can discuss visualization programs, techniques, and data formats as well as programming and cluster usage. In particular, how to effectively display your data.
If you are working with large amounts of data on the Princeton High Performance Computing environment you can learn about remote visualization from tigressdata.princeton.edu.
May 18, 2016 · 5:00 p.m.– 6:00 p.m. · Room 346, Visualization Lab
PrincetonPy Meetup: Interactive Real-time Streaming Applications with Spark 2.0's Structured Streaming
May 25, 2016 · 7:00 p.m.– 8:30 p.m. · 120 Lewis Science Library
Speaker: Miles Yucht, Software Engineer, Databricks
Apache Spark 2.0 introduces a new API for performing data analysis on real-time data sources called Structured Streaming. Based on the Dataframe and DataSet APIs in Spark, Structured Streaming enables developers to write streaming applications that can take advantage of the Catalyst's and Tungsten's powerful optimization engines in Spark while still providing the resilience of RDDs.
In this talk, I'll discuss Structured Streaming, why it makes real-time data analysis faster and easier, how it works under the hood, and how to take advantage of these new tools in Python applications. Through a demo in Databricks Community Edition, using high-level DataFrames and Dataset APIs, I will demonstrate how simple it is to write a Structured Streaming application and interactively analyze real-time data sets.
Miles Yucht is a software engineer at Databricks, where he has been working on a team developing a highly multi-tenant, scalable version of Databricks, allowing users from many different organizations to use Databricks simultaneously on a single server, which was released as Databricks Community Edition at this year's Spark Summit East 2016. He graduated from Princeton University with A.B. in Computer Science.
Join the new PrincetonPy meetup group at http://www.meetup.com/PrincetonPy-Meetup/.
Princeton University Python Community: http://princetonpy.com/about-pupc/