Usage Guidelines
General Notes
The head nodes, tiger 1 and tiger 2, should be used for interactive work only, such as compiling programs, and submitting jobs as described below. No jobs should be run on the head node, other than brief tests that last no more than a few minutes.
Job Scheduling
All jobs must be run through the scheduler on Tiger. This cluster uses a number of preemptive queues to control access and provide resources in a timely fashion to those who have contributed the most.
Jobs are further prioritized through the moab scheduler based on a number of factors: job size, run times, node availability, wait times, and percentage of usage over a 30 day period.
Jobs will move to the test, small, medium, or long queues as determined by the scheduler. They are differentiated by the wallclock time requested as follows:
test queue
1 hour limit
2048 core limit per job
9472 total cores available
2 job maximum per user
small queue
24 hour limit
2048 core limit per job
8512 total cores available
512 job maximum per user
medium queue
72 hour limit
512 core limit per job
7104 total cores available
256 job maximum per user
long queue
144 hour limit
256 core limit per job
4736 total cores available
128 job maximum per user
Distribution of CPU and memory
There are 9472 processors available, 16 per node. Each node contains at least 64 GB of memory (4 GB per core). The nodes are identified as r1c1n1 through r10c1n16, where there are 9 racks containing 4 chassis each with 16 nodes per chassis and one rack with a single chassis.
There are also 48 nodes with double the memory at 128 GB (8 GB per core). These can be requested by adding the large memory node flag (mem128) to your PBS allocation request: e.g., nodes=2:ppn=16:mem128.
The nodes are all connected through Infiniband switches for MPI traffic and NFS I/O and over a Gigabit Ethernet for PBS and normal communication.
Appropriate File System Usage
/home (shared via NFS to all the compute nodes) is intended for scripts, source code, executables and small static data sets that may be needed as standard input/configuration for codes.
/scratch/network (shared via NFS to all the compute nodes) is intended for dynamic data that does not require high bandwidth i/o such as storing final output for a compute job. You may a create a directory/scratch/network/myusername, and use this to place your temporary files. Files are NOT backed up so this data should be moved to persistent storage once it is no longer needed for continued computation. Also note that these scratch directories may be cleaned nightly to purge files older than 30 days.
/scratch/gpfs is intended for dynamic data that requires higher bandwidth I/O. Files are NOT backed up so this data should be moved to persistent storage as soon as it is no longer needed for computations. These files are cleaned nightly to purge files older than 30 days.
/tigress-hsm (shared using GPFS, 270 TB) is intended for more persistent storage and should provide high bandwidth i/o (400 MB/s aggregate bandwidth for jobs across 16 or more nodes). Users are provided with a default quota of 512 GB when they request a directory in this storage, and that default can be increased by requesting more. We do ask people to consider what they really need, and to make sure they regularly clean out data that is no longer needed since this filesystem is shared by the users of all our systems. See /tigress Usage Guidelines for more information.
/scratch (local to each compute node - 146 GB available on each node) is intended for data local to each task of a job, and it should be cleaned out at the end of each job. This is the fastest storage for access. Note that these scratch directories will be cleaned nightly to purge files older than 30 days.
Please remember that these are shared resources for all users.
