How to use OpenMPI with OpenMP or multi-threaded Intel MKL
This FAQ applies to all clusters except for Hecate.
Normally, by following the instructions in each cluster's Tutorial, every processor/core reserved via Slurm is assigned to a separate MPI process. Howerver, in the event that an application combines MPI (usually between nodes), and OpenMP (within nodes), different instructions need to be followed.
One specific example of using OpenMP in an MPI job is when using Intel MKL. MKL internally uses OpenMP to execute the mathematical operations in a parallel fashion when configured to use the muti-threaded layer (click here for MKL configuration instructions). By default, OpenMP and multi-threaded MKL will use all cores in a node, but it may be desirable to allocate less than that when an application makes lighter usage of MKL or simply needs fewer parallel threads per process.
In either case, when using OpenMPI, you must configure the number of threads per process as follows:
- Set the number of mpi tasks you require by specifying the number of nodes (#SBATCH -N) and the number of mpi processes you desire per node (#SBATCH --ntasks-per-node) then specify the number of OpenMP threads per MPI process (#SBATCH -c).
Set OMP_NUM_THREADS to the number of OpenMP threads to be created for each MPI process.
Invoke srun as normal
Example: This job will have 6 mpi processes (2 per node), each with 8 omp threads for a total of 48 cores. This would use 3 full 16-core nodes, like on Tiger.
#SBATCH -N 3
#SBATCH -c 8
module load openmpi
Note: (ntasks-per-node * c) must be less than or equal to the number of cores per node for the particular machine.
Note: If you are not using MPI and are simply running a single-process (but multi-threaded) job, then follow the above instructions and set '--ntasks-per-node' to 1.