Programs are scheduled to run on Della using the sbatch command, a component of Slurm. Your job will be put into the appropriate quality of service (qos) based on time limits requested. For further information, see the Usage Guidelines section and, the sbatch man page.
Programming On Della
The Gnu and Intel compilers are installed on Della. The standard MPI implementation for Della is OpenMPI, an MPICH-compatible library that supports the Infiniband infrastructure.
To set up your environment correctly on Della, it is highly recommended to use the module command. This is a utility to correctly set your environment without having to know all the paths to the executables. Different environments can be set quickly allowing useful comparisons of code compiled with different executables. In most cases a simple module load openmpi command can be issued setting up your environment to use the latest openmpi as well as the Intel compilers.
Compiling parallel MPI programs
module load openmpi (loads the openmpi environment as well as the Intel compilers)
Submitting a Job
Once the executable is compiled, a job script will need to be created for the scheduler. For this cluster there are a total of 128 nodes with 12 cores per node and 96 nodes with 20 cores per node. There is also a mix of memory installed. To submit to only the 20 core ivybridge nodes add the following flag: #SBATCH -C ivy. To submit to only the 20 core haswell nodes add: #SBATCH -C haswell. To submit to either a 20-core ivybridge or 20-core haswell node add: #SBATCH -C "ivy OR haswell".
Here is a sample script which uses 48 cores allocated as 12 cores on 4 nodes:
# parallel job using 48 cores. and runs for 4 hours (max)
#SBATCH -N 4 # node count
#SBATCH -t 4:00:00
# sends mail when process begins, and
# when it ends. Make sure you define your email
module load openmpi
# problems with mpiexec/mpirun so use srun
(Change "yourNetID" to your NetID in the above script.)
To submit the job to the batch queuing system use:
While the above works fine for parallel jobs, to run an OpenMP job or threaded application one must change the srun parameters. Since srun launches a process per core allocated, for these type of jobs you only want to launch a single process and allow the threads to use all the allocated cores. To launch this type of job you can use:
srun -n 1 --cpu-bind=none a.out
or better yet, simply DON'T use srun at all for serial or multithreaded jobs.
Compiling Vectorized code with Intel Compilers
Due to the mixed architecture of della, using the Intel compiler flag, -xhost, can result in poor performance or error messages. Compiling with -xhost on the head node (an ivybridge) will produce code optimized for the ivybridge processors. As a result, when run on the older westmere nodes, the executable will fail with an error message similar to: "Please verify that both the operating system and the processor support Intel(R) F16C and AVX1 instructions." When running on the newer haswell nodes, the code will run run below optimal performance. The recommended solution is to use the -ax flag to tell the compiler to build a binary with instruction sets for each architecture, and choose the best one at runtime. For example, instead of -xAVX or -xhost, use: -axCORE-AVX2,AVX,SSE4.2.
|sinfo||Shows how nodes are being used..|
||Shows the priority assigned to queued jobs.|
|squeue or qstat||Shows jobs in the queues.|
A graphical display of the queues.
|slurmtop||A text based view of the cluster nodes.|