Usage Guidelines
The head node, adroit, should be used for interactive work only, such as compiling programs, and submitting jobs as described below. No jobs should be run on the head node, other than brief tests that last no more than a few minutes.
Job Scheduling
All jobs must be run through the scheduler on Adroit. If a job would exceed any of the limits below, it will be held until it is eligible to run. A job should not specify the queue into which it should run, allowing the default queue to route the job according to the resources it requires. Currently, jobs move to either the short or long queue, as follows:
short queue
4 hour wall clock limit
16 cores maximum per job
32 job maximum per user
32 processor cores maximum per user
16 cores maximum per job
32 job maximum per user
32 processor cores maximum per user
medium queue
24 hour wall clock limit
32 processor cores maximum
32 processor cores maximum
long queue
15 day wall clock limit
8 cores maximum per job
8 job maximum per user
16 processor cores maximum per user
32 processor cores maximum allocation
8 cores maximum per job
8 job maximum per user
16 processor cores maximum per user
32 processor cores maximum allocation
Distribution of CPU and memory
There are 64 processors available, eight per node. Each node contains 16 GB of memory. The nodes are identified as adroit-001 through adroit-008.
Scratch Space
Scratch space is available in /scratch on every node. Create a directory /scratch/network/username and use this to place temporary files/data. This space is an NFS-mounted shared space of close to 695 GB. Files are NOT backed up so move any important files to long term storage (your home directory, another machine). Also note that these scratch directories will be cleaned nightly to purge files older than 15 days.
Also available is approximately 10GB of local storage known as /scratch. Since this storage is not shared across all nodes, it is ideally suited for temporary output.
Running 3rd-Party Software
If you are running 3rd-party software whose characteristics (e.g., memory usage) you are unfamiliar with, please check your job after 5-15 minutes using 'top' or 'ps -ef' on the compute nodes being used. If the memory usage is growing rapidly, or close to exceeding the per-processor memory limit, you should terminate your job before it causes the system to hang or crash.
Please remember that these are shared resources for all users.
