Events - Daily
|Thursday, August 01|
CANCELLED- Virtual School of Computational Science and Engineering - Proven Algorithmic Techniques for Many-core Processors
Studying many current GPU computing applications, we have learned that the limits of an application's scalability are often related to some combination of memory bandwidth saturation, memory contention, imbalanced data distribution, or data structure/algorithm interactions. Successful GPU application developers often adjust their data structures and problem formulation specifically for massive threading and executed their threads leveraging shared on-chip memory resources for bigger impact. We looked for patterns among those transformations, and here present the seven most common and crucial algorithm and data optimization techniques we discovered. Each can improve performance of applicable kernels by 2-10X in current processors while improving future scalability.
Experience working in a Unix environment
Experience developing and running scientific codes written in C or C++
Basic knowledge of CUDA (A short online course, Introduction to CUDA, is available to registered on-site students who need assistance in meeting this prerequisite)
Although not required, knowledge from "Programming Heterogeneous Parallel Computing Systems," offered July 9-13 this year is highly recommended.
Wen-Mei W. Hwu, professor of electrical and computer engineering and principal investigator of the CUDA Center of Excellence, University of Illinois at Urbana-Champaign
David Kirk, NVIDIA fellow
John Stratton, Ph.D. candidate in Electrical and Computer Engineering and author of the exercise solutions to "Programming Massively Parallel Processors - A Hands-on Approach"
why problem formulation and algorithm design choices can have dramatic effect on performance
common algorithmic strategies for high performance
Increasing locality in dense arrays
tiling of data access and layout
Reducing output interference
conversion from scatter to gather
parallelizing reductions and histograms
Dealing with non-uniform data
data sorting and binning
Dealing with sparse data
sorting and compaction
Dealing with dynamic data
parallel queue-based algorithms
Improving data efficiency in large data traversal
stencil and other grid-based computation
Case studies from application domains
computational fluid dynamics
NOTE: Students are required to provide their own laptops.
349 Lewis Library · 9:30 a.m.– 5:00 p.m.