Info about commands and general usage
Quick-start guide
1. First, you create a pbs-script for your program/executable. A pbs-script is just like a shell script, with some additional PBS-specific environment definitions.
Here's an example pbs-script
2. The system contains 3 batch queues:
short: for jobs taking less than 1 hour of cpu-time
medium: for jobs taking between 1 and 12 hours of cpu-time
long: for jobs taking more than 12 hours of cpu-time
You should have a rough idea about the amount of cpu-time your job will require.
3. Then, to submit the job use the qsub command:
qsub [ -l cput=HH:MM:SS ] my_pbs_script
where,
- HH:MM:SS defines the desired time in hours:minutes:seconds (e.g -l cput=1:30:00 means 1hr 30mins and -l cput=1:30 means 1min 30secs)
- based on the cpu-time you specify, your job will be routed to the appropriate queue
- If you do not specify a cpu-time, the system will default to 1 hour and route your job to the short queue
- for the short and medium queues, the cut-off time is the lesser of specified time and maximum allowed time
- for the long queue, the cut-off time is the specified time
- if a job goes beyond it's determined cut-off cpu-time, the server will kill it
After the submission, you will get back a job-id for your job
4. to check the status of your job use the qstat command:
qstat job-id
5. Similarly, to delete your job use the qdel command:
qdel job-id
Notes:
# the command xpbs is a GUI-based tool to submit/monitor/delete your jobs
# Commonly used options to the qsub/qstat/qdel commands are given below. For further details, please refer to the man-pages for these commands
Useful commands and example usage
* qsub
qsub pbs_script # short job
qsub -l cput=2:0:0 pbs_script # medium job
qsub -l cput=13:0:0 -l nodes=1:ppn=4 pbs_script # long job requesting 4 cpus on 1 node
qsub -l cput=10:0:0 -l nodes=1:np2:ppn=2 pbs_script # job requesting 2 cpus on 1 node of type "np2"
qsub -l cput=10:0:0 -l nodes=2:np4 pbs_script # job requesting 2 cpus on nodes of type "np4"
qsub -l cput=10:0:0 -l pmem=1gb pbs_script # job requesting 1gb of physical memory allocation on a node
* qstat
qstat -n # in addition to the basic information, nodes allocated to a job are listed
qstat -u username # only jobs owned by username are listed
qstat -f jobid # show full listing for jobid
The Message Passing Interface libraries are installed on the cluster. We use the MPICH2 implementation for the same. There is also an alternate version of the mpiexec submission script available that is patched for use with Torque.
MPICH2 is available at: /usr/local/mpich2/
mpiexec executable (called from PBS scripts) is at: /usr/local/mpiexec/bin/
These are already included in the global PATH on Feynman, with /usr/local/mpiexec/bin/ being ahead by default
For details on mpiexec, look at its man-page.
For details on MPICH2, look at:
MPICH2 User Guide [pdf]
MPICH2 Commands and Routines
For job submission, follow the instructions above.
