OpenACC Tutorial - Profiling: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 23: Line 23:
What is so important about hotspots in the code ?  
What is so important about hotspots in the code ?  
Amdahl's law says that "Parallelizing the most time-consuming routines (i.e. the hotspots) will have the most impact".
Amdahl's law says that "Parallelizing the most time-consuming routines (i.e. the hotspots) will have the most impact".
</translate>
 
<translate>
== Build the Sample Code == <!--T:10-->
== Build the Sample Code == <!--T:10-->
For this example we will use code from the [https://github.com/calculquebec/cq-formation-openacc repositories]. Download the package and change to the '''cpp''' or '''f90''' directory. The object of this exercise is to compile and link the code, obtain an executable, and then profile it.
For this example we will use code from the [https://github.com/calculquebec/cq-formation-openacc repositories]. Download the package and change to the '''cpp''' or '''f90''' directory. The object of this exercise is to compile and link the code, obtain an executable, and then profile it.
Line 53: Line 52:
}}
}}


<translate>
After the executable is created, we are going to profile that code.
After the executable is created, we are going to profile that code.
</translate>
{{Callout
{{Callout
|title=<translate><!--T:6-->
|title=<translate><!--T:6-->
Line 68: Line 70:
}}
}}


 
<translate>
=== PGPROF Profiler  ===
=== PGPROF Profiler  ===
[[File:Pgprof new0.png|thumbnail|300px|Starting a new PGPROF session|left  ]]
[[File:Pgprof new0.png|thumbnail|300px|Starting a new PGPROF session|left  ]]
Line 83: Line 85:
=== NVIDIA NVPROF Command Line Profiler  ===
=== NVIDIA NVPROF Command Line Profiler  ===
NVIDIA also provides a command line version called NVPROF, similar to GPU prof
NVIDIA also provides a command line version called NVPROF, similar to GPU prof
</translate>
{{Command
{{Command
|nvprof --cpu-profiling on ./cgi.x  
|nvprof --cpu-profiling on ./cgi.x  
Line 100: Line 103:
======== Data collected at 100Hz frequency
======== Data collected at 100Hz frequency
}}
}}
 
<translate>
== Compiler Feedback  ==
== Compiler Feedback  ==
Before working on the routine, we need to understand what the compiler is actually doing by asking ourselves the following questions:
Before working on the routine, we need to understand what the compiler is actually doing by asking ourselves the following questions:
Line 118: Line 121:
CXXFLAGS=-fast -Minfo=all,intensity,ccff LDFLAGS=${CXXFLAGS}
CXXFLAGS=-fast -Minfo=all,intensity,ccff LDFLAGS=${CXXFLAGS}
* Rebuild
* Rebuild
</translate>
{{Command
{{Command
|make
|make
Line 158: Line 162:
pgc++ CXXFLAGS=-fast -Minfo=all,intensity,ccff LDFLAGS=-fast main.o -o cg.x -fast
pgc++ CXXFLAGS=-fast -Minfo=all,intensity,ccff LDFLAGS=-fast main.o -o cg.x -fast
}}
}}
 
<translate>
== Computational Intensity  ==
== Computational Intensity  ==
Computational Intensity of a loop is a measure of how much work is being done compared to memory operations.
Computational Intensity of a loop is a measure of how much work is being done compared to memory operations.
Line 168: Line 172:
== Understanding the code  ==
== Understanding the code  ==
Let's look closely at the following code:
Let's look closely at the following code:
</translate>
<syntaxhighlight lang="cpp" line highlight="1,5,10,12">
<syntaxhighlight lang="cpp" line highlight="1,5,10,12">
for(int i=0;i<num_rows;i++) {
for(int i=0;i<num_rows;i++) {
Line 182: Line 187:
}
}
</syntaxhighlight>  
</syntaxhighlight>  
<translate>
Given the code above, we search for data dependencies:
Given the code above, we search for data dependencies:
* Does one loop iteration affect other loop iterations?
* Does one loop iteration affect other loop iterations?
Line 189: Line 195:
[[OpenACC Tutorial - Adding directives|Onward to the next unit: Adding directives]]<br>
[[OpenACC Tutorial - Adding directives|Onward to the next unit: Adding directives]]<br>
[[OpenACC Tutorial|Back to the lesson plan]]
[[OpenACC Tutorial|Back to the lesson plan]]
</translate>
Bureaucrats, cc_docs_admin, cc_staff, rsnt_translations
2,837

edits