38,897
edits
(Updating to match new version of source page) |
|||
Line 2: | Line 2: | ||
PGPROF is a powerful and simple tool for analyzing the performance of parallel programs written with OpenMP, MPI, OpenACC, or CUDA. | PGPROF is a powerful and simple tool for analyzing the performance of parallel programs written with OpenMP, MPI, OpenACC, or CUDA. | ||
There are two profiling modes: Command-line | There are two profiling modes: Command-line mode and graphical mode. | ||
= Quickstart guide = | = Quickstart guide = | ||
Line 14: | Line 14: | ||
PGPROF is part of the PGI compiler package, so run <code>module avail pgi</code> to see what versions are currently available with the compiler, MPI, and CUDA modules you have loaded. For a comprehensive list of PGI modules, run <code>module -r spider '.*pgi.*'</code>. | PGPROF is part of the PGI compiler package, so run <code>module avail pgi</code> to see what versions are currently available with the compiler, MPI, and CUDA modules you have loaded. For a comprehensive list of PGI modules, run <code>module -r spider '.*pgi.*'</code>. | ||
As of December 2018, these were: | |||
* pgi/13.10 | * pgi/13.10 | ||
* pgi/17.3 | * pgi/17.3 | ||
Use <code>module load pgi/version</code> to choose a version | Use <code>module load pgi/version</code> to choose a version; for example, to load the PGI compiler version 17.3, use: | ||
{{Command|module load pgi/17.3}} | {{Command|module load pgi/17.3}} | ||
== Compile your code == | == Compile your code == | ||
To get useful information from | To get useful information from PGPROF, you first need to compile your code with one of the PGI compilers (<code>pgcc</code> for C, <code>pgc++</code> for C++ , <code>pgfortran</code> for Fortran). A source in Fortran may need to be compiled with the <code>-g</code> flag. | ||
== | == Command-line mode == | ||
Data collection: Use PGPROF to run the application and save the performance data in a file. In this example, the application | |||
is <code>a.out</code> and we choose to save the data in <code>a.prof</code>. | is <code>a.out</code> and we choose to save the data in <code>a.prof</code>. | ||
{{Command|pgprof -o a.prof ./a.out}} | {{Command|pgprof -o a.prof ./a.out}} | ||
To visualize the performance data in command-line mode: | You can optionally save the data file and analyze it in graphical mode (see below) using ''File | Import''. | ||
<br>Analysis: To visualize the performance data in command-line mode: | |||
{{Command|pgprof -i a.prof}} | {{Command|pgprof -i a.prof}} | ||
The results are usually divided into several categories: | The results are usually divided into several categories, for example: | ||
* GPU kernel execution profile | * GPU kernel execution profile | ||
* CUDA API execution profile | * CUDA API execution profile | ||
Line 102: | Line 101: | ||
}} | }} | ||
== | == Graphical mode == | ||
[[File: | [[File:pgprof-start-session.png|thumbnail|300px|Starting a new PGPROF session (click for a larger image)|left ]] | ||
In graphical mode, both data collection and analysis can be accomplished in the same session. There are several steps that need to be done to collect and visualize performance data in this mode: | In graphical mode, both data collection and analysis can be accomplished in the same session. There are several steps that need to be done to collect and visualize performance data in this mode: | ||
* Launch the PGI profiler. | * Launch the PGI profiler. | ||
Line 111: | Line 110: | ||
* Select the executable file you want to profile and then add any arguments appropriate for your profiling. | * Select the executable file you want to profile and then add any arguments appropriate for your profiling. | ||
* Click ''Next'', then ''Finish''. | * Click ''Next'', then ''Finish''. | ||
* In the ''CPU Details'' tab, | * In the ''CPU Details'' tab, click on the ''Show the top-down (callers first) call tree view'' button as shown in the figure below. | ||
[[File: | [[File:pgprof2.png|thumbnail|300px|Visualizing performance data (click for a larger image)|left ]] | ||
Take note of these four panes in the graphical interface (see the image "Visualizing performance data", to the left): | Take note of these four panes in the graphical interface (see the image "Visualizing performance data", to the left): | ||
* The Timeline: shows all the events ordered by the time they executed | * The Timeline: shows all the events ordered by the time they executed | ||
* GPU | * '''GPU Details''': shows performance details for the GPU kernels | ||
* CPU | * '''CPU Details''': shows performance details for the CPU functions | ||
* | * '''Properties''': shows all the details for a selected function in the timeline window | ||
<br clear=all> | <br clear=all> | ||