Programs are scheduled to run on Della using the sbatch command, a component of Slurm. Your job will be put into the appropriate quality of service (qos) based on time limits requested. For further information, see the Usage Guidelines section and, the sbatch man page.
Programming On Della
The Gnu and Intel compilers are installed on Della. The standard MPI implementation for Della is OpenMPI, an MPICH-compatible library that supports the Infiniband infrastructure.
To set up your environment correctly on Della, it is highly recommended to use the module command. This is a utility to correctly set your environment without having to know all the paths to the executables. Different environments can be set quickly allowing useful comparisons of code compiled with different executables. In most cases a simple module load openmpi command can be issued setting up your environment to use the latest openmpi as well as the Intel compilers.
Compiling parallel MPI programs
module load openmpi (loads the openmpi environment as well as the Intel compilers)
Submitting a Job
Once the executable is compiled, a job script will need to be created for the scheduler. For this cluster there are a total of 128 nodes with 12 cores per node and 64 nodes with 20 cores per node. There is also a mix of memory installed. To submit to only the 20 core ivybridge nodes add the following flag: #SBATCH -C ivy.
Here is a sample script which uses 48 cores allocated as 12 cores on 4 nodes:
# parallel job using 48 cores. and runs for 4 hours (max)
#SBATCH -N 4 # node count
#SBATCH -t 4:00:00
# sends mail when process begins, and
# when it ends. Make sure you define your email
module load openmpi
# problems with mpiexec/mpirun so use srun
(Change "yourNetID" to your NetID in the above script.)
To submit the job to the batch queuing system use:
While the above works fine for parallel jobs, to run an OpenMP job or threaded application one must change the srun parameters. Since srun launches a process per core allocated, for these type of jobs you only want to launch a single process and allow the threads to use all the allocated cores. To launch this type of job you can use:
srun -n 1 --cpu-bind=none a.out
or better yet, simply DON'T use srun at all for serial or multithreaded jobs.
|sinfo||Shows how nodes are being used..|
||Shows the priority assigned to queued jobs.|
|squeue or qstat||Shows jobs in the queues.|
A graphical display of the queues.
|slurmtop||A text based view of the cluster nodes.|