OpenACC Tutorial - Profiling: Difference between revisions

Showing nvprof first - nvvp will be moved to another page.
(The Visual Profiler only works with CUDA C/C++ or OpenACC codes)
(Showing nvprof first - nvvp will be moved to another page.)
Line 25: Line 25:


== Build the Sample Code == <!--T:10-->
== Build the Sample Code == <!--T:10-->
For this example we will use code from this [https://github.com/calculquebec/cq-formation-openacc Git repository].
For this example we use code from this [https://github.com/calculquebec/cq-formation-openacc Git repository].
Download the package and go to the <code>cpp</code> or the <code>f90</code> directory.
Download the package and go to the <code>cpp</code> or the <code>f90</code> directory.
The object of this exercise is to compile and link the code, obtain an executable, and then profile it.
The object of this exercise is to compile and link the code, obtain an executable, and then profile it.
Line 78: Line 78:
<translate>
<translate>
<!--T:7-->
<!--T:7-->
For the purpose of this tutorial, we use two profilers as described below:  
For the purpose of this tutorial, we use two profilers:  
* NVIDIA Visual Profiler NVVP - a cross-platform analyzing tool for the codes written with OpenACC and CUDA C/C++ instructions.
* NVPROF - a command line text-based profiler that can analyze non-GPU codes.
* NVPROF - a command line text-based version of the NVIDIA Visual Profiler.
* NVIDIA Visual Profiler NVVP - a graphical cross-platform analyzing tool for the codes written with OpenACC and CUDA C/C++ instructions.
</translate>
</translate>
}}
}}
<translate>
Since our previously built <code>cg.x</code> is not yet using the GPU, we will start the analysis with the <code>nvprof</code> profiler.


=== NVIDIA NVPROF Command Line Profiler === <!--T:15-->
NVIDIA also provides a command line version called NVPROF, similar to GPU prof
</translate>
{{Command
|module load cuda/11.7
}}
{{Command
|nvprof --cpu-profiling on ./cg.x
|result=
...
<Program output >
...
======== CPU profiling result (bottom up):
Time(%)      Time  Name
83.54%  90.6757s  matvec(matrix const &, vector const &, vector const &)
83.54%  90.6757s  {{!}} main
  7.94%  8.62146s  waxpby(double, vector const &, double, vector const &, vector const &)
  7.94%  8.62146s  {{!}} main
  5.86%  6.36584s  dot(vector const &, vector const &)
  5.86%  6.36584s  {{!}} main
  2.47%  2.67666s  allocate_3d_poisson_matrix(matrix&, int)
  2.47%  2.67666s  {{!}} main
  0.13%  140.35ms  initialize_vector(vector&, double)
  0.13%  140.35ms  {{!}} main
...
======== Data collected at 100Hz frequency
}}
<translate>
<translate>
=== NVIDIA Visual Profiler === <!--T:13-->
 
=== NVIDIA Visual Profiler - (to be moved to another page) === <!--T:13-->
[[File:Nvvp-pic0.png|thumbnail|300px|NVVP profiler|right]]
[[File:Nvvp-pic1.png|thumbnail|300px|Browse for the executable you want to profile|right]]


<!--T:14-->
<!--T:14-->
Line 103: Line 135:
}}
}}
<translate>
<translate>
[[File:Nvvp-pic0.png|thumbnail|300px|NVVP profiler|right]]
[[File:Nvvp-pic1.png|thumbnail|300px|Browse for the executable you want to profile|right]]


# After the NVVP startup window, you get prompted for a ''Workspace'' directory, which will be used for temporary files. Replace <code>home</code> with <code>scratch</code> in the suggested path. Then click ''OK''.
# After the NVVP startup window, you get prompted for a ''Workspace'' directory, which will be used for temporary files. Replace <code>home</code> with <code>scratch</code> in the suggested path. Then click ''OK''.
Line 114: Line 144:
# Click ''Next >'' to review additional profiling options.
# Click ''Next >'' to review additional profiling options.
# Click ''Finish'' to start profiling the executable.
# Click ''Finish'' to start profiling the executable.
=== NVIDIA NVPROF Command Line Profiler === <!--T:15-->
NVIDIA also provides a command line version called NVPROF, similar to GPU prof
</translate>
{{Command
|nvprof --cpu-profiling on ./cg.x
|result=
<Program output >
======== CPU profiling result (bottom up):
84.25% matvec(matrix const &, vector const &, vector const &)
84.25% main
9.50% waxpby(double, vector const &, double, vector const &, vector const &)
3.37% dot(vector const &, vector const &)
2.76% allocate_3d_poisson_matrix(matrix&, int)
2.76% main
0.11% __c_mset8
0.03% munmap
  0.03% free_matrix(matrix&)
    0.03% main
======== Data collected at 100Hz frequency
}}
<translate>


== Compiler Feedback  == <!--T:16-->
== Compiler Feedback  == <!--T:16-->
cc_staff
782

edits