System crashes due to NFS problems
We have noticed that heavy disk I/O conditions have led to system crashes. You can reduce stress on NFS by maximizing the use of 25 GB local space available on each compute node under /tmp. Compute node can use this space as buffers. These buffers should be filled/emptied at the start/end of the job. This will considerably reduce the number of simultaneously open files over NFS.