ARM software: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
m (Rdickson moved page Allinea software to ARM software without leaving a redirect: Part of translatable page "Allinea software")
No edit summary
Line 17: Line 17:


<!--T:5-->
<!--T:5-->
The current license limits the use of DDT/MAP to a maximum of 512 CPU cores across all users at any given time while DDT-GPU is limited to 8 GPUs.
The current license limits the use of DDT/MAP to a maximum of 512 CPU cores across all users at any given time, while DDT-GPU is limited to 8 GPUs.


= Usage = <!--T:6-->
= Usage = <!--T:6-->
Line 23: Line 23:


<!--T:7-->
<!--T:7-->
#Allocate the node or nodes on which to do the debugging or profiling. This will open a shell session on the allocated node.
1. Allocate the node or nodes on which to do the debugging or profiling. This will open a shell session on the allocated node.


  <!--T:8-->
  <!--T:8-->
Line 29: Line 29:


<!--T:9-->
<!--T:9-->
#Load the appropriate module, for example
2. Load the appropriate module, for example


  <!--T:10-->
  <!--T:10-->
Line 42: Line 42:


<!--T:13-->
<!--T:13-->
#Run the ddt or map command, for example
3. Run the ddt or map command.


  <!--T:14-->
  <!--T:14-->
Line 52: Line 52:


<!--T:16-->
<!--T:16-->
#When done, exit the shell to terminate the allocation.
4. When done, exit the shell to terminate the allocation.


== CUDA code == <!--T:17-->
== CUDA code == <!--T:17-->


<!--T:18-->
<!--T:18-->
#Allocate the node or nodes on which to do the debugging or profiling with <code>salloc</code>. This will open a shell session on the allocated node.  
1. Allocate the node or nodes on which to do the debugging or profiling with <code>salloc</code>. This will open a shell session on the allocated node.  


  <!--T:19-->
  <!--T:19-->
Line 63: Line 63:


<!--T:20-->
<!--T:20-->
#Load the appropriate module, for example
2. Load the appropriate module, for example


  <!--T:21-->
  <!--T:21-->
Line 76: Line 76:


<!--T:24-->
<!--T:24-->
# Ensure a cuda module is loaded.
3. Ensure a cuda module is loaded.


  <!--T:25-->
  <!--T:25-->
Line 82: Line 82:


<!--T:26-->
<!--T:26-->
#Run the ddt command.
4. Run the ddt command.


  <!--T:27-->
  <!--T:27-->
Line 88: Line 88:


<!--T:28-->
<!--T:28-->
#When done, exit the shell to terminate the allocation.
5. When done, exit the shell to terminate the allocation.


= Known issues = <!--T:29-->
= Known issues = <!--T:29-->

Revision as of 16:11, 28 February 2018

Other languages:

Introduction[edit]

ARM DDT (formerly know as Allinea DDT) is a powerful commercial parallel debugger with a graphical user interface. It can be used to debug serial, MPI, multi-threaded, and CUDA programs, or any combination of the above, written in C, C++, and FORTRAN. MAP—an efficient parallel profiler—is another very useful tool from ARM (formerly Allinea).

The following modules are available on Graham:

  • allinea-cpu, for CPU debugging and profiling;
  • allinea-gpu, for GPU or mixed CPU/GPU debugging.

As this is a GUI application, log in using ssh -Y, and use an SSH client like MobaXTerm (Windows) or XQuartz (Mac) to ensure proper X11 tunnelling.

Both DDT and MAP are normally used interactively through their GUI, which is normally accomplished using the salloc command (see below for details). MAP can also be used non-interactively, in which case it can be submitted to the scheduler with the sbatch command.

The current license limits the use of DDT/MAP to a maximum of 512 CPU cores across all users at any given time, while DDT-GPU is limited to 8 GPUs.

Usage[edit]

CPU-only code, no GPUs[edit]

1. Allocate the node or nodes on which to do the debugging or profiling. This will open a shell session on the allocated node.

salloc --x11 --time=0-1:00 --mem-per-cpu=4G --ntasks=4

2. Load the appropriate module, for example

module load allinea-cpu

ːThis may fail with a suggestion to load an older version of OpenMPI first. In this case, reload the OpenMPI module with the suggested command, and then reload the allinea-cpu module.

module load openmpi/2.0.2
module load allinea-cpu

3. Run the ddt or map command.

ddt path/to/code
map path/to/code

ːMake sure the MPI implementation is the default OpenMPI in the Allinea application window, before pressing the Run button. If this is not the case, press the Change button next to the Implementation: string, and select the correct option from the drop-down menu.

4. When done, exit the shell to terminate the allocation.

CUDA code[edit]

1. Allocate the node or nodes on which to do the debugging or profiling with salloc. This will open a shell session on the allocated node.

salloc --x11 --time=0-1:00 --mem-per-cpu=4G --ntasks=1 --gres=gpu:1

2. Load the appropriate module, for example

module load allinea-gpu

ːThis may fail with a suggestion to load an older version of OpenMPI first. In this case, reload the OpenMPI module with the suggested command, and then reload the allinea-gpu module.

module load openmpi/2.0.2
module load allinea-gpu

3. Ensure a cuda module is loaded.

module load cuda

4. Run the ddt command.

ddt path/to/code

5. When done, exit the shell to terminate the allocation.

Known issues[edit]

MPI DDT[edit]

  • For some reason the debugger does not show queued MPI messages (e.g. when paused in an MPI deadlock).

OpenMP DDT[edit]

  • Memory debugging module (which is off by default) does not work.

CUDA DDT[edit]

  • Memory debugging module (which is off by default) does not work.

MAP[edit]

  • MAP currently does not work correctly on Graham; we are working on resolving this issue. For the moment, the workaround is to request a SHARCNET account from your Compute Canada account (via CCDB) and run MAP on Orca's development nodes using these instructions.