Debugging with TotalView on the clusters
The following instructions apply to all clusters except for Hecate. Please contact CSES for instructions on using TotalView on Hecate.
Debugging with OpenMPI
- If necessary, start an instance of an X server on your personal system.
- Open an xterm window and ssh into the cluster's head node with X11-forwarding enabled. Depending on your settings, you may need to invoke the ssh client with "ssh -Y".
- On the cluster's head node, in your .cshrc or .bashrc file:
- Set up your module environment (e.g., module load openmpi)
- Set the environmental variable TVDSVRLAUNCHCMD to 'ssh'
- Add totalview (/opt/export/toolworks/totalview/bin) to your PATH environmental variable.
- Create a file called .tvdrc in your home directory containing the line:
source /usr/local/openmpi/<version>/intel<version>/x86_64/etc/openmpi-totalview.tcl
(if using 'module load openmpi/intel'),
or
source /usr/local/openmpi/<version>/gcc/x86_64/etc/openmpi-totalview.tcl
(if using 'module load openmpi/gcc')- where <version> is a number such as 1.3.0 (for openmpi 1.3.0), or a number such as 110 (for intel 11.0), that you have loaded by means of the "module load openmpi" command you previously executed. Type 'module list' to see what was loaded.
- Build your application as you normally would, but add the -g option to the mpicc command line for debugging.
- Create a PBS script file like the one you would use to run your MPI program, but which does not have an mpirun/mpiexec command in it. This will be used to allocate the nodes/processors.
- Type 'qsub -I -X <pbs-script-name>'. This will create an interactive job with X11 forwarding enabled.
- When you get a compute node prompt, type:
mpirun --debug -np `wc -l <${PBS_NODEFILE}` <absolute-path-to-your-MPI-program> - The TotalView windows should now be displayed by your X server.
- You will see a dialog box that states: "Process mpirun is a parallel job. Do you want to stop the job now?".
- Click on Yes, and TotalView will stop at the beginning of your main program.
- You should see your source code, and can enter breakpoints, single-step through it, etc.
If you see assembly code, you most likely forgot to use the -g option when building your program.
- When done debugging, click on File | Exit to close the TotalView windows.
- Log out of the compute node session to end the PBS job.
Debugging with MPICH
- If necessary, start an instance of an X server on your personal system.
- Open an xterm window and ssh into the cluster's head node with X11-forwarding enabled. Depending on your settings, you may need to invoke the ssh client with "ssh -Y".
- On the cluster's head node, in your .cshrc or .bashrc file:
- Set up your module environment (e.g., module load mpich-debug)
- Set the environmental variable TVDSVRLAUNCHCMD to 'ssh'
- Add totalview (/opt/export/toolworks/totalview/bin) to your PATH environmental variable.
- Build your application as you normally would, but add the -g option to the mpicc command line for debugging.
- Create a PBS script file like the one you would use to run your MPI program, but which does not have an mpirun/mpiexec command in it. This will be used to allocate the nodes/processors.
- Type 'qsub -I -X <pbs-script-name>'. This will create an interactive job with X11 forwarding enabled.
- When you get a compute node prompt, type:
mpirun -tv -machinefile ${PBS_NODEFILE} -np `wc -l <${PBS_NODEFILE}` \
<absolute-path-to-your-MPI-program> - The TotalView windows should now be displayed by your X server.
- The code should be stopped at the beginning of your main program.
- You should see your source code, and can enter breakpoints, single-step through it, etc.
If you see assembly code, you most likely forgot to use the -g option when building your program. - When you click on 'Go' for the first time, you will halt at a permanent MPIR_Breakpoint deep in the MPI_Init() code. Just click on 'Go' again to hit your first breakpoint.
- When done debugging, click on File | Exit to close the TotalView windows.
- Log out of the compute node session to end the PBS job.
