Tutorials
Running Programs
Programs are scheduled to run on Tiger using the qsub command, a component of Torque. Users belonging to groups with an associated queue, should specify their queue in the job submission script (#PBS -q <queue_name>). Others should NEVER specify a queue. Your job will be put into the appropriate queue, based on the requirements that you describe. See Usage Guidelines and the qsub man page for details.
Programming Tiger
The Gnu and Intel compilers are installed on Tiger. The standard MPI implementation for Tiger is OpenMPI, an MPICH compatible library that supports the Infiniband infrastructure.
To set up your environment correctly on Tiger, it is highly recommended to use the module facility. This is a utility to correctly set your environment without having to know all the paths to the executables. Different environments can be set quickly allowing useful comparisons of code compiled with different executables. In most cases a simple module load openmpi command can be issued setting up your environment to use the latest openmpi as well as the Intel compilers.
Compiling parallel MPI programs
module load openmpi (loads the openmpi environment as well as the Intel compilers)
mpif90 myMPIcode.f
mpicc myMPIcode.c
Submitting a Job
Once the executable is compiled, a job script will need to be created for the scheduler. For this machine there are a total of 16 processor cores per node. Special node properties include the large memory nodes (mem128) and GP-GPUs (gpu). These nodes can be chosen by adding those special flags to the requirements (nodes=2:ppn=16:mem128 for example).
Here is a sample script which uses 32 processors allocated as 16 processors on 2 nodes:
cd mpi_directory
cat parallel.cmd #!/bin/bash
# parallel job using 16 processors. and runs for 4 hours (max)
#PBS -l nodes=2:ppn=16,walltime=4:00:00
# sends mail if process aborts, when it begins, and
# when it ends (abe). Make sure you define your email
# address.
#PBS -m abe
#PBS -M NetID@princeton.edu
#
module load openmpi
cd /home/yourNetID/mpi_directory
mpiexec ./a.out(Change "NetID" to your NetID in the above script.)
To submit the job to the batch queuing system use:
qsub parallel.cmdGPU Usage
To allocate a job using the GPUs (up to 4 per node) you will need to add another specifier to the #PBS command requesting the amount of GPUs as well as sending it to the GPU queue::
#PBS -l nodes=1:ppn=4:gpus=1
#PBS -q gpuThere are two methods of allocating the 4 GPUs we have per node.. One, the trivial method, is to allocate an entire node (:gpus=4) and then you have access to all 4 GPUs. In this case you are free to allocate the GPUs however you wish.
The second case, allocating less than 4 GPUs, requires more care. Since other programs may be using some of the GPUs on the node, you will need to use only those GPUs which have been allocated to you from the scheduler.
One request we have is that you specify a ppn value of 4 per GPU requested. This helps us to correctly calculate node usage statistics (ie: nodes=1:ppn=8:gpus=2).
The scheduler will place your job on a node and assign via an environment variable the number of the GPU(s) allocated for your use. Prior to calling the actual CUDA code, you will need to set up your environment so that the allocated GPU is the one which you actually use. This can be done using the command:
export CUDA_VISIBLE_DEVICES=`grep $HOSTNAME $PBS_GPUFILE | awk -Fu '{printf A$2;A=","}'`
