How to use OpenMPI with OpenMP or multi-threaded Intel MKL
This FAQ applies to all clusters except for Hecate.
Normally, by following the instructions in each cluster's Tutorial, every processor/core reserved via PBS is assigned to a separate MPI process. Howerver, in the event that an application combines MPI (usually between nodes), and OpenMP (within nodes), different instructions need to be followed.
One specific example of using OpenMP in an MPI job is when using Intel MKL. MKL internally uses OpenMP to execute the mathematical operations in a parallel fashion when configured to use the muti-threaded layer (click here for MKL configuration instructions). By default, OpenMP and multi-threaded MKL will use all cores in a node, but it may be desirable to allocate less than that when an application makes lighter usage of MKL or simply needs fewer parallel threads per process.
In either case, when using OpenMPI, you must configure the number of threads per process as follows:
- Set the number of cores allocated by PBS to equal the number needed for all processes and threads. Generally it is best policy to completely fill nodes. Therefore set the PBS ppn parameter to equal the number of cores per node (i.e., 12 or 16), and the nodes parameter to the total number of cores needed divided by the number of cores per node.
-
Set OMP_NUM_THREADS to the number of OpenMP threads to be created for each MPI process.
-
Invoke mpirun specifying the number of MPI processes wanted, along with the '-bynode' command line option.
Example: There are 12 cores per node, and you want 8 MPI processes, each running 6 OpenMP threads.
PBS script:
#PBS -l nodes=4:ppn=12
export OMP_NUM_THREADS=6mpirun -bynode -np 8 myhybridcode.exe
Note: (nodes * ppn) == (OMP_NUM_THREADS * np parameter)
Note: If you are not using MPI and are simply running a single-process job, then follow the above instructions and set 'nodes' and 'np' to 1.
