PGPROF

From Alliance Doc
Revision as of 19:30, 21 November 2018 by FuzzyBot (talk | contribs) (Updating to match new version of source page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Other languages:

PGPROF is a powerful and simple tool for analyzing the performance of parallel programs written with OpenMP, MPI, OpenACC, or CUDA. There are two profiling modes: Command-line profiling and Visual profiling.

Quickstart guide

Using PGPROF usually consists of two steps:

  1. Data collection: Run the application with profiling enabled.
  2. Analysis: Visualize the data produced in the first step.

Both steps can be accomplished in either command-line mode or graphical mode.

Environment modules

Before you start profiling with PGPROF, the appropriate module needs to be loaded.

PGPROF is part of the PGI compiler package, so run module avail pgi to see what versions are currently available with the compiler, MPI, and CUDA modules you have loaded. For a comprehensive list of PGI modules, run module -r spider '.*pgi.*'. At the time this was written these were:

  • pgi/13.10
  • pgi/17.3

Use module load pgi/version to choose a version. For example, to load the PGI compiler version 17.3, do:

Question.png
[name@server ~]$ module load pgi/17.3

Compile your code

To get useful information from Pgprof, you first need to compile your code with one of the PGI compilers (pgcc for C, pgc++ for C++ , pgfortran for Fortran). A source in Fortran may need to be compiled with the -g flag.

Working in command-line mode

In command-line mode, two distinct commands are used to collect timing data and to analyze it.

First, use PGPROF to run the application and save the performance data in a file. In this example, the application is a.out and we choose to save the data in a.prof.

Question.png
[name@server ~]$ pgprof -o a.prof ./a.out

You can optionally save the data file and analyze it in graphical mode (see below) using File | import.

To visualize the performance data in command-line mode:

Question.png
[name@server ~]$ pgprof -i a.prof

The results are usually divided into several categories:

  • GPU kernel execution profile
  • CUDA API execution profile
  • OpenACC execution profile
  • CPU execution profile
Question.png
[name@server ~]$  ====== Profiling result:
Time(%)      Time     Calls       Avg       Min       Max  Name
 38.14%  1.41393s        20  70.696ms  70.666ms  70.731ms  calc2_198_gpu
 31.11%  1.15312s        18  64.062ms  64.039ms  64.083ms  calc3_273_gpu
 23.35%  865.68ms        20  43.284ms  43.244ms  43.325ms  calc1_142_gpu
  5.17%  191.78ms       141  1.3602ms  1.3120us  1.6409ms  [CUDA memcpy HtoD]
...
======== API calls:
Time(%)      Time     Calls       Avg       Min       Max  Name
 92.65%  3.49314s        62  56.341ms  1.8850us  70.771ms  cuStreamSynchronize
  3.78%  142.36ms         1  142.36ms  142.36ms  142.36ms  cuDevicePrimaryCtxRetain
...
======== OpenACC (excl):
Time(%)      Time     Calls       Avg       Min       Max  Name
 36.27%  1.41470s        20  70.735ms  70.704ms  70.773ms  acc_wait@swim-acc-data.f:223
 63.3%  1.15449s        18  64.138ms  64.114ms  64.159ms  acc_wait@swim-acc-data.f:302

======== CPU profiling result (bottom up):
Time(%)      Time  Name
 59.09%  8.55785s  cudbgGetAPIVersion
 59.09%  8.55785s   start_thread
 59.09%  8.55785s     clone
 25.75%  3.73007s  cuStreamSynchronize
 25.75%  3.73007s   __pgi_uacc_cuda_wait
 25.75%  3.73007s     __pgi_uacc_computedone
 10.38%  1.50269s       swim_mod_calc2_

The output can be cropped to show one of the categories. For example, the option --cpu-profiling will show only the CPU results.

The option --cpu-profiling-mode top-down will make the PGPROF show the main subroutine at the top and the rest of functions it called below:

Question.png
[name@server ~]$ pgprof --cpu-profiling-mode top-down -i a.prof 
======== CPU profiling result (top down):
Time(%)      Time  Name
 97.36%  35.2596s  main
 97.36%  35.2596s   MAIN_
 32.02%  11.5976s     swim_mod_calc3_
 29.98%  10.8578s     swim_mod_calc2_
 25.93%  9.38965s     swim_mod_calc1_
  6.82%  2.46976s     swim_mod_inital_
  1.76%  637.36ms   __fvd_sin_vex_256

To find out what part of your application takes the longest time to run you can use the option --cpu-profiling-mode bottom-up which orients the call tree to show each function followed by functions that called it working backwards to main.

Question.png
[name@server ~]$ pgprof --cpu-profiling-mode bottom-up -i a.prof
======== CPU profiling result (bottom up):
Time(%)      Time  Name
 32.02%  11.5976s  swim_mod_calc3_
 32.02%  11.5976s   MAIN_
 32.02%  11.5976s     main
 29.98%  10.8578s  swim_mod_calc2_
 29.98%  10.8578s   MAIN_
 29.98%  10.8578s     main
 25.93%  9.38965s  swim_mod_calc1_
 25.93%  9.38965s   MAIN_
 25.93%  9.38965s     main
  3.43%  1.24057s  swim_mod_inital_

Working in graphical mode

Starting a new PGPROF session (click for a larger image)

In graphical mode, both data collection and analysis can be accomplished in the same session. There are several steps that need to be done to collect and visualize performance data in this mode:

  • Launch the PGI profiler.
    • Since the Pgrof's GUI is based on Java, it should be executed on the compute node in the interactive session rather than on the login node, as the latter does not have enough memory (see Java for more details). An interactive session can be started with salloc --x11 ... to enable X11 forwarding (see Interactive jobs for more details).
  • In order to start a new session, open the File menu and click on New Session.
  • Select the executable file you want to profile and then add any arguments appropriate for your profiling.
  • Click Next, then Finish.
  • In the CPU Details tab, push on the Show the top-down (callers first) call tree view button as shown in the figure below.
Visualizing performance data (click for a larger image)

Take note of these four panes in the graphical interface (see the image "Visualizing performance data", to the left):

  • The Timeline: shows all the events ordered by the time they executed
  • GPU details: shows performance details for the GPU kernels
  • CPU details: shows performance details for the CPU functions
  • The Property tab: shows all the details for a selected function in the timeline window

References

PGPROF is a product of PGI, which is a subsidiary of NVIDIA Corporation.