Princeton leads efforts to develop national data training framework for high energy physics

For the third consecutive summer, high  energy physics graduate students, postdocs and instructors from across the United States, as well as from India, Italy and Switzerland, gathered at Princeton University to attend the school on Tools, Techniques and Methods for Computational and Data Science for High Energy Physics or CoDaS-HEP, held this year July 22-26.

Princeton physicist Peter Elmer, lead conference organizer and executive director and principal investigator for the NSF-funded Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP), emphasized that the annual CoDaS-HEP school is one component of a multi-tiered project spearheaded by Princeton, with the aim of creating a more advanced software cyberinfrastructure in high energy physics and providing young physicists with essential specialized software skills they’ll need to succeed.

Students and researchers gather around a table

The summer program on Tools, Techniques and Methods for Computational and Data Science for High Energy Physics, or CoDaS-HEP, was held at Princeton for the third consecutive year. Pictured: Henry Schreiner, a computational physicist at Princeton and one of the instructors (second from left) with, from left to right: Lauren Hay, graduate student, University at Buffalo; Andres Quintero, FermiLab; Stephanie Kwan, graduate student, Princeton; and Michael (Tres) Reid, graduate student, Cornell University.

“We must make sure that the research software ecosystem in our field is sustainable over the long-term,” said Elmer, “particularly since the upgrades represented by the High-Luminosity Large Hadron Collider, or HL-LHC, and other large scientific facilities of the 2020s will be relevant through at least the 2030s. More than anything else, software sustainability requires people with the right skills who see software as part of their research product.” (LHC, the powerful proton-smasher buried in a tunnel at CERN, beneath the border of Switzerland and France, aims to generate 10 times the amount of its current data starting in 2026.)

The CoDaS-HEP summer school, which has received funding through at least 2023, is an important part of building a sustainable scientific software development community. Supported by the National Science Foundation and the Princeton Institute for Computational Science and Engineering (PICSciE), and co-sponsored by the Department of Physics and the Office of the Dean for Research, CoDaS-HEP combined hands-on trainings and lectures, led by instructors from Princeton, Cornell, Intel, UC San Diego, New York University and the University of Chicago.

Two students wearing white hard hats look into the machinery of CERN

Reid, left, gains hands-on experience at CERN.

Said Michael “Tres” Reid, a CoDaS-HEP attendee and Cornell graduate student in high energy physics, “I’m helping with a dark matter search at CERN’s Compact Muon Solenoid detector, or CMS, and [am] hoping to be involved with a high-performance computing project upgrading CMS’s track reconstruction software. The idea behind track reconstruction is that charged particles deposit small amounts of energy as they move through each layer of the tracker. We call those energy deposits ‘hits’ and use them to reconstruct each particle’s trajectory. In a few years, the current CMS reconstruction software will have a difficult time keeping up with the collision rate.”

Realizing that his physics background didn’t prepare him for the computer science challenge facing him, Reid decided to attend CoDaS-HEP. “Up until this point, I only ever really needed my code to work, not to necessarily work well,” he said. “Optimizing it was only an afterthought. But in terms of CMS work, the computational learning curve is so steep that I have only been able to do trivial tasks thus far. The lessons at CoDaS-HEP helped me fill in the gaps in my technical knowledge of parallel programming, and I no longer feel like I will become as lost as I dig deeper into the technical aspects of the project.”

2 researchers pose in front a building

IRIS-HEP fellow Pratyush Das, left, with mentor Jim Pivarski at Fermilab in Illinois.

That kind of feedback is music to the ears of Sudhir Malik, associate professor of physics at the University of Puerto Rico, Mayagüez Campus. Malik is part of IRIS-HEP and collaborating principal investigator (with Elmer) on a related NSF-funded project called “Framework for Integrated Software Training for High Energy Physics,” or FIRST-HEP. Malik and his colleagues are responsible for prototyping various outreach and training ideas such as hackathons and programming workshops for STEM teachers.

Said Malik: “While our core software training model is built around high energy physics institutes, universities and national labs worldwide, it is also well positioned to engage corresponding local communities of K-12 teachers and students in STEM disciplines in cybertraining and software skills. We recently launched several activities, including software workshops, machine learning hackathons and Python programming for STEM teachers, which are all steps in this direction.”

IRIS-HEP is working with The Carpentries, a nonprofit group that teaches foundational coding and data science skills to researchers worldwide, to develop a basic introductory curriculum for high energy physics students that can scale up to run many events during the year across the U.S. and in Europe. The idea is that this basic introductory course will be given to all graduate students as they begin their research activities.

In addition, the institute is running a mentorship project “where we try to connect students and postdocs very interested in research software development with people who can mentor them and connect them with the larger community working on research software projects,” said Elmer. “Often a student or postdoc will not have such a person at their own institution; to address this, the IRIS-HEP software institute has funding for IRIS-HEP fellows to travel and work for three months with such a mentor.”

Pratyush Das is a current IRIS-HEP fellow and CoDaS-HEP attendee, working with Princeton physicist Jim Pivarski this summer.

“I am actually an undergraduate, perhaps the only one attending this school,” said Das, a student at the Institute of Engineering & Management in Kolkata, India.

“Hearing about the discovery of the Higgs Boson when I was in middle school was one of the primary events that pushed me towards high energy physics,” Das said. “As an IRIS-HEP fellow, I am currently working on uproot, an implementation of the ROOT software framework in purely Python and the numerical computation library numpy. It has been a huge success with over 15,000 downloads. After speaking to a lot of participants at the school, most told me that it was something they kept in their toolset.”

Asked what stands out as one of the major benefits of the summer school, Das replied, “Being able to freely interact with stalwarts in the field who were invited to give talks is something that wouldn’t normally be possible.”

Andrea Delgado, a Ph.D. candidate in high energy physics at Texas A&M, agreed that the program was well worth attending. “My research team at my home university specializes in jet reconstruction,” she said. “Jets are very abstract objects that are reconstructed from the quarks and gluons resulting from the high energy proton-proton collisions at hadron colliders. Identifying the particles that compose them and writing the algorithms that cluster those particles is a very challenging task and therefore very computationally intensive.”

“That's why I am interested in the computational tools that are available to particle physicists,” Delgado said. “We usually don't get the proper experience in software development. CoDaS-HEP school organizers did a great job in selecting the topics of the school, and bringing in experts in programming and physicists who created tools specifically tailored to HEP. The hands-on exercises really helped the newly acquired information to really sink in. It was a very intense week but it was also nice to be able to collaborate with the other students in solving the exercises that the professors assigned. “

3 researchers talk to each other

From left: Hugo Becerril, Ph.D. candidate at The University of Illinois at Chicago, and Andrea Delgado, Ph.D. candidate at Texas A&M, in conversation with Princeton’s Peter Elmer.  

To support the hands-on exercises at CoDaS-HEP and other training events, the IRIS-HEP team at the University of Chicago developed a scalable web-based machine learning platform. This provided a science gateway to CPU and GPU resources available from the Pacific Research Platform, an NSF-supported computing cyberinfrastructure in California.  Thirty-four nodes, each with two Nvidia 1080 Ti processors, were accessed by 55 students and lecturers working through a 14 module minicourse in PyTorch, an open source machine learning framework.

Said Elmer, looking ahead to IRIS-HEP’s growth and the goals for next year’s CoDaS-HEP summer school, “We are doing this because the success of the next phase of this global scientific project depends in part on whether we have the necessary tools to analyze data on an increasingly massive scale.

“Fundamentally this relies on having people with the sophisticated skills to build these advanced software tools. We need to invest today to make sure have both the software tools and the skilled people we need to continue the search for physics beyond the Standard Model and, should it be discovered, to study its details and implications. I think we’ll be ready.”