Princeton wants to shake up how graduate students learn the basic tools and principles of writing good code for computationally intensive scientific research, and Gabe Perez-Giz is emphatically on board.
Gabe Perez-Giz is the newest instructor of APC 524 “Software Engineering for Scientific Computing,” offered through the Program in Applied and Computational Mathematics (PACM) and required as a core course for the popular graduate certificate program in computational and information science (CIS). The course is administered by the Princeton Institute for Computational Science and Engineering (PICSciE) with James Stone, the Lyman Spitzer, Jr. Professor of Astrophysical Sciences, as the CIS graduate certificate director. Stone is also chair of the Department of Astrophysical Sciences and professor of applied and computational mathematics.
Perez-Giz has been rethinking fundamental questions about how to train students to write code that is easy to maintain and share with others.
“The code bases in many research groups get pretty messy over time,” said Perez-Giz, an astrophysicist, educator and science communicator who was the first writer and host of PBS Space Time, a PBS Digital Studios show on YouTube about astrophysics and space policy.
“In part, that happens because it’s very hard and time-consuming for incoming graduate students, or even those in their third or fourth year, to learn effective software development practices. While the course description focuses on the topics I’ll cover, such as testing, debugging and parallel computing, the longer-term goal of this course is to effect a computing culture-change in order to stanch the cycle of bad programming habits and lay a foundation for less buggy and better documented software going forward.”
Perez-Giz said that many students with prior programming experience are nonetheless accustomed to interacting with computers primarily under the point-and-click paradigm popularized by web browsers, word processors and smartphones. That can make text-driven software development tools seem extra foreign. “When you've spent most of your life tapping on icons or touching a mouse or using Outlook, Word and Excel, it's a massive lifestyle shift to recenter your modus operandi around plain text files, a keyboard rather than a mouse, and a powerful text editor instead of a word processor.”
Many of today’s coding courses and other instructional resources assume that students are already familiar with how a Linux/Unix operating system runs and how to drive a computer from the command line comfortably, but that’s often not the case, explained Perez-Giz.
“A book or web page describing some higher-level concept can't plug a hole in this more base-level knowledge without digressing for days, and so people often get stuck or learn things in patchwork ways because they lack this ground-up familiarity with Linux and command line basics,” he said. “I'd like to see us break this cycle of half-learning by covering this foundational material systematically, maybe before graduate coursework begins.”
“Gabe shares our philosophy at PICSciE, particularly our educational mission and [its role] as an interdisciplinary institute,” said Florevel (Floe) Fusin-Wischusen, manager of the institute who also oversees its education, training and outreach program. “A core principle driving our education, training and outreach program — including our certificate program in CIS, year-round supplemental trainings and mini-courses, help sessions and seminars — is that we must always provide our students a solid grounding in the fundamental best practices for scientific computing, and address the holes in their knowledge rather than glossing over them. These training opportunities play an important role for students to quickly acquire knowledge in advanced research topics and gain hands-on experience.”
Jeroen Tromp, director of PICSciE, added, “Princeton is doing something unique by giving our graduate students and researchers a competitive edge.
“We want them to leave here knowing how to develop scientific software with an eye toward reproducibility and reusability, not just expedient quick-fixes,” said Tromp, who is the Blair Professor of Geology and professor of geosciences and applied and computational mathematics.
As PACM director Peter Constantin noted, “The success of the course shows that the need for sustaining modern computational skills and mastering best computational practices is greatly appreciated by Princeton's research community.
“PACM is the leading applied mathematics program in the country due in large measure to the excellence of its students” added Constantin, who is the John von Neumann Professor in Applied and Computational Mathematics. “The program fosters collaboration and interaction across fields, and APC 524 was created in order to support computational graduate research broadly construed.”
Perez-Giz, who received his Ph.D. in Physics from Columbia University and was a National Science Foundation postdoctoral fellow in astronomy and astrophysics at New York University, said that his own formal training in writing code was close to zero. “I didn’t get more serious about it until grad school, when I had some friends who were really into Linux and emacs. I got some parts and built a computer and set it up as a server. In short, I spent a few months not sleeping a lot, asking questions, and trying to find resources that would clarify both the ‘how’ and the ‘why’ of certain practices and tools.”
People typically don’t act on learning a topic or technique until they're faced with a need for it, added Perez-Giz. “By that point, there's usually a deadline, and now you're scrambling to pick things up piecemeal just to get things done, so most of us never get the breathing room to learn the 'right way' to do things.”
This is why, as an instructor, he gives his students use-cases in a low stakes project-driven setting. Currently, the main assignment in APC 524 is a capstone project that students work on in small groups, developing what Perez-Giz calls “a small but non-trivial piece of scientific software, often related to their own research.” For instance, students might work on code that performs calculations to help with laser design, or some image-processing software to automate gene-expression analysis in fly embryos. These projects cover a wide range of disciplines, with the common thread being the tools and best practices followed while writing them.
Paul Kaneelil can speak to the effectiveness of this approach. A first-year doctoral student in mechanical and aerospace engineering, Kaneelil was looking for a course that would give him a foundation in a wide variety of tools pertaining to version control, profiling and debugging, as a precursor to learning computational fluid dynamics.
Explained Kaneelil: “In APC 524, my six-member group project was developing a genetic algorithm that optimizes structural design. The goal of our code was to design a structure that meets the user-specified boundary and load conditions while minimizing cost and weight of the material, and maximizing factor of safety. I think the most exciting tool we used was version control using git/github. It allowed group members to simultaneously work on different parts of the project and them combine them together seamlessly. This was extremely helpful for us since we had lots of files that we were working with. By learning and actively using these tools, I am able to streamline the computing process and improve the overall quality of my scientific computing projects.”
“I've come to view these hands-on projects as a form of vaccination against bad habits,” said Perez-Giz. “I tell students, ‘Here are some key tasks that you will encounter as you write scientific code, and you will be tempted to do them a hacky way or avoid them altogether, so let's make you aware of a better way so that doesn’t happen.’”