OpenACC Tutorial - Profiling/fr: Difference between revisions

Updating to match new version of source page
(Created page with "Tutoriel OpenACC – Profils")
 
(Updating to match new version of source page)
Line 5: Line 5:
|content=
|content=
* Understand what a profiler is''
* Understand what a profiler is''
* Understand how to use PGPROF profiler.
* Understand how to use PGPROF profiler  
* Understand how the code is performing .
* Understand how the code is performing  
* Understand where to focus your time and re-write most time consuming routines
* Understand where to focus your time and rewrite most time consuming routines
}}
}}


== Gathering a Profile ==
== Code profiling ==
Why would one needs to gather a profile of a code ? Because it's the only way to understand:
Why would one need to profile code? Because it's the only way to understand:
# Where time is being spent (Hotspots)
# Where time is being spent (Hotspots)
# How the code is performing
# How the code is performing
# Where to focus your time
# Where to focus your time


What is so important about the hotspots of the code ?  
What is so important about hotspots in the code ?  
The Amdahl's law says that "Parallelizing the most time-consuming (i.e. the hotspots) routines will have the most impact".
Amdahl's law says that "Parallelizing the most time-consuming routines (i.e. the hotspots) will have the most impact".


== Build the Sample Code ? ==
== Build the Sample Code ==
For this example we will use a code from the [https://github.com/calculquebec/cq-formation-openacc repositories]. Download the package and change to the '''cpp''' or '''f90''' directory. The point of this exercise is to compile&link the code, obtain executable, and then profile them.
For this example we will use code from the [https://github.com/calculquebec/cq-formation-openacc repositories]. Download the package and change to the '''cpp''' or '''f90''' directory. The object of this exercise is to compile and link the code, obtain an executable, and then profile it.
{{Callout
{{Callout
|title=Which compiler ?
|title=Which compiler ?
Line 43: Line 43:
After the executable is created, we are going to profile that code.
After the executable is created, we are going to profile that code.
{{Callout
{{Callout
|title=Which profiller ?
|title=Which profiler ?
|content=
|content=
For the purpose of this tutorial, we use several profilers as described below:  
For the purpose of this tutorial, we use several profilers as described below:  
Line 53: Line 53:




=== PGPROF Profiller ===
=== PGPROF Profiler ===
[[File:Pgprof new0.png|thumbnail|300px|Starting new session|left  ]]
[[File:Pgprof new0.png|thumbnail|300px|Starting new session|left  ]]
Bellow are several snapshots demonstrating how to start with the PGPROF profiler. First step is to initiate a new session.  
These next pictures demonstrate how to start with the PGPROF profiler. The first step is to initiate a new session.  
Then browse for an executable file of the code you want to profile.
Then, browse for an executable file of the code you want to profile.
Then specify the profiling options. For example, if you need to profile CPU activity then set the "Profile execution of the CPU" box.
Finally, specify the profiling options; for example, if you need to profile CPU activity then click the "Profile execution of the CPU" box.


=== NVIDIA Visual Profiller ===
=== NVIDIA Visual Profiler ===


Another profiler available for OpenACC applications is NVIDIA Visual Profiler. It's a cross-platform analyzing tool for the codes written with OpenACC and CUDA C/C++ instructions.
Another profiler available for OpenACC applications is the NVIDIA Visual Profiler. It's a crossplatform analyzing tool for code written with OpenACC and CUDA C/C++ instructions.
[[File:Nvvp-pic0.png|thumbnail|300px|The NVVP profiler|right  ]]
[[File:Nvvp-pic0.png|thumbnail|300px|NVVP profiler|right  ]]
[[File:Nvvp-pic1.png|thumbnail|300px|Browse for executable you want to profile|right  ]]
[[File:Nvvp-pic1.png|thumbnail|300px|Browse for the executable you want to profile|right  ]]


=== NVIDIA NVPROF Command Line Profiler  ===
=== NVIDIA NVPROF Command Line Profiler  ===
Line 86: Line 86:


== Compiler Feedback  ==
== Compiler Feedback  ==
Before working on the routine, we need to understand what the compiler is actually doing. Several questions we got to ask ourselves:
Before working on the routine, we need to understand what the compiler is actually doing by asking ourselves the following questions:
* What optimizations were applied ?  
* What optimizations were applied?  
* What prevented further optimizations ?
* What prevented further optimizations?
* Can very minor modification of the code affect the performance ?
* Can very minor modifications of the code affect performance?


The PGI compiler offers you a '''-Minfo''' flag with the following options:
The PGI compiler offers you a '''-Minfo''' flag with the following options:
Line 148: Line 148:
'''Computation Intensity = Compute Operations / Memory Operations'''
'''Computation Intensity = Compute Operations / Memory Operations'''


Computational Intensity of 1.0 or greater is often a clue that something might run well on a GPU.
Computational Intensity of 1.0 or greater suggests that the loop might run well on a GPU.


== Understanding the code  ==
== Understanding the code  ==
Lets look more closely at the following code:
Let's look closely at the following code:
<syntaxhighlight lang="cpp" line highlight="1,5,10,12">
<syntaxhighlight lang="cpp" line highlight="1,5,10,12">
for(int i=0;i<num_rows;i++) {
for(int i=0;i<num_rows;i++) {
38,782

edits