38,760
edits
(Created page with "Tutoriel OpenACC – Profils") |
(Updating to match new version of source page) |
||
Line 5: | Line 5: | ||
|content= | |content= | ||
* Understand what a profiler is'' | * Understand what a profiler is'' | ||
* Understand how to use PGPROF profiler | * Understand how to use PGPROF profiler | ||
* Understand how the code is performing | * Understand how the code is performing | ||
* Understand where to focus your time and | * Understand where to focus your time and rewrite most time consuming routines | ||
}} | }} | ||
== | == Code profiling == | ||
Why would one | Why would one need to profile code? Because it's the only way to understand: | ||
# Where time is being spent (Hotspots) | # Where time is being spent (Hotspots) | ||
# How the code is performing | # How the code is performing | ||
# Where to focus your time | # Where to focus your time | ||
What is so important about | What is so important about hotspots in the code ? | ||
Amdahl's law says that "Parallelizing the most time-consuming routines (i.e. the hotspots) will have the most impact". | |||
== Build the Sample Code | == Build the Sample Code == | ||
For this example we will use | For this example we will use code from the [https://github.com/calculquebec/cq-formation-openacc repositories]. Download the package and change to the '''cpp''' or '''f90''' directory. The object of this exercise is to compile and link the code, obtain an executable, and then profile it. | ||
{{Callout | {{Callout | ||
|title=Which compiler ? | |title=Which compiler ? | ||
Line 43: | Line 43: | ||
After the executable is created, we are going to profile that code. | After the executable is created, we are going to profile that code. | ||
{{Callout | {{Callout | ||
|title=Which | |title=Which profiler ? | ||
|content= | |content= | ||
For the purpose of this tutorial, we use several profilers as described below: | For the purpose of this tutorial, we use several profilers as described below: | ||
Line 53: | Line 53: | ||
=== PGPROF | === PGPROF Profiler === | ||
[[File:Pgprof new0.png|thumbnail|300px|Starting new session|left ]] | [[File:Pgprof new0.png|thumbnail|300px|Starting new session|left ]] | ||
These next pictures demonstrate how to start with the PGPROF profiler. The first step is to initiate a new session. | |||
Then browse for an executable file of the code you want to profile. | Then, browse for an executable file of the code you want to profile. | ||
Finally, specify the profiling options; for example, if you need to profile CPU activity then click the "Profile execution of the CPU" box. | |||
=== NVIDIA Visual | === NVIDIA Visual Profiler === | ||
Another profiler available for OpenACC applications is NVIDIA Visual Profiler. It's a | Another profiler available for OpenACC applications is the NVIDIA Visual Profiler. It's a crossplatform analyzing tool for code written with OpenACC and CUDA C/C++ instructions. | ||
[[File:Nvvp-pic0.png|thumbnail|300px| | [[File:Nvvp-pic0.png|thumbnail|300px|NVVP profiler|right ]] | ||
[[File:Nvvp-pic1.png|thumbnail|300px|Browse for executable you want to profile|right ]] | [[File:Nvvp-pic1.png|thumbnail|300px|Browse for the executable you want to profile|right ]] | ||
=== NVIDIA NVPROF Command Line Profiler === | === NVIDIA NVPROF Command Line Profiler === | ||
Line 86: | Line 86: | ||
== Compiler Feedback == | == Compiler Feedback == | ||
Before working on the routine, we need to understand what the compiler is actually doing | Before working on the routine, we need to understand what the compiler is actually doing by asking ourselves the following questions: | ||
* What optimizations were applied ? | * What optimizations were applied? | ||
* What prevented further optimizations ? | * What prevented further optimizations? | ||
* Can very minor | * Can very minor modifications of the code affect performance? | ||
The PGI compiler offers you a '''-Minfo''' flag with the following options: | The PGI compiler offers you a '''-Minfo''' flag with the following options: | ||
Line 148: | Line 148: | ||
'''Computation Intensity = Compute Operations / Memory Operations''' | '''Computation Intensity = Compute Operations / Memory Operations''' | ||
Computational Intensity of 1.0 or greater | Computational Intensity of 1.0 or greater suggests that the loop might run well on a GPU. | ||
== Understanding the code == | == Understanding the code == | ||
Let's look closely at the following code: | |||
<syntaxhighlight lang="cpp" line highlight="1,5,10,12"> | <syntaxhighlight lang="cpp" line highlight="1,5,10,12"> | ||
for(int i=0;i<num_rows;i++) { | for(int i=0;i<num_rows;i++) { |