ARM software
Introduction[edit]
ARM DDT (formerly know as Allinea DDT) is a powerful commercial parallel debugger with a graphical user interface. It can be used to debug serial, MPI, multi-threaded, and CUDA programs, or any combination of the above, written in C, C++, and FORTRAN. MAP—an efficient parallel profiler—is another very useful tool from ARM (formerly Allinea).
The following modules are available on Graham:
- ddt-cpu, for CPU debugging and profiling;
- ddt-gpu, for GPU or mixed CPU/GPU debugging.
The following module is available on Niagara:
- ddt
As this is a GUI application, log in using ssh -Y
, and use an SSH client like MobaXTerm (Windows) or XQuartz (Mac) to ensure proper X11 tunnelling.
Both DDT and MAP are normally used interactively through their GUI, which is normally accomplished using the salloc
command (see below for details). MAP can also be used non-interactively, in which case it can be submitted to the scheduler with the sbatch
command.
The current license limits the use of DDT/MAP to a maximum of 512 CPU cores across all users at any given time, while DDT-GPU is limited to 8 GPUs.
Usage[edit]
CPU-only code, no GPUs[edit]
1. Allocate the node or nodes on which to do the debugging or profiling. This will open a shell session on the allocated node.
salloc --x11 --time=0-1:00 --mem-per-cpu=4G --ntasks=4
2. Load the appropriate module, for example
module load ddt-cpu
3. Run the ddt or map command.
ddt path/to/code map path/to/code
- Make sure the MPI implementation is the default OpenMPI in the DDT/MAP application window, before pressing the Run button. If this is not the case, press the Change button next to the Implementation: string, and select the correct option from the drop-down menu. Also, specify the desired number of cpu cores in this window.
4. When done, exit the shell to terminate the allocation.
IMPORTANT: The current versions of DDT and OpenMPI have a compatibility issue which breaks the important feature of DDT - displaying message queues (available from the "Tools" drop down menu). There is a workaround: before running DDT, you have to execute the following command:
$ export OMPI_MCA_pml=ob1
Be aware that the above workaround can make your MPI code run slower, so only use this trick when debugging.
CUDA code[edit]
1. Allocate the node or nodes on which to do the debugging or profiling with salloc
. This will open a shell session on the allocated node.
salloc --x11 --time=0-1:00 --mem-per-cpu=4G --ntasks=1 --gres=gpu:1
2. Load the appropriate module, for example
module load ddt-gpu
- This may fail with a suggestion to load an older version of OpenMPI first. In this case, reload the OpenMPI module with the suggested command, and then reload the ddt-gpu module.
module load openmpi/2.0.2 module load ddt-gpu
3. Ensure a cuda module is loaded.
module load cuda
4. Run the ddt command.
ddt path/to/code
5. When done, exit the shell to terminate the allocation.
Known issues[edit]
MAP[edit]
- There appear to be issues with the newest version of MAP (ddt-cpu/18.3) installed on Graham, users have to use the (slightly) older version ddt-cpu/7.1 for now.
See also[edit]
- "Code profiling on Graham", video, 54 minutes.