OpenACC Tutorial - Adding directives: Difference between revisions

m
Line 290: Line 290:
[[File:Openacc profiling1.png|thumbnail|<translate>Click to enlarge</translate>]]
[[File:Openacc profiling1.png|thumbnail|<translate>Click to enlarge</translate>]]
<translate>
<translate>
The results are correct. However, not only do we not get any speed up, but we rather get a slow down by a factor of almost 4! Lets profile the code again using NVidia's visual profiler (<tt>nvvp</tt>). This can be done with the following steps:  
The results are correct. However, not only do we not get any speed up, but we rather get a slow down by a factor of almost 4! Let's profile the code again using NVidia's visual profiler (<tt>nvvp</tt>). This can be done with the following steps:  
# Start <tt>nvvp</tt> with the command <tt>nvvp &</tt>  (the <tt>&</tt> sign is to start it in the background
# Start <tt>nvvp</tt> with the command <tt>nvvp &</tt>  (the <tt>&</tt> sign is to start it in the background
# Go in File -> New Session
# Go in File -> New Session
Line 296: Line 296:
# Click "Next" until you can click "Finish".  
# Click "Next" until you can click "Finish".  


This will run the program and generate a timeline of the execution. The resulting timeline is illustrated on the image on the right side. As we can see, almost all of the run time is being spent transferring data between the host and the device. This is very often the case when one ports a code from CPU to GPU. We will look at how to optimize this in the [[OpenACC Tutorial - Data movement|next part of the tutorial]].  
This will run the program and generate a timeline of the execution. The resulting timeline is illustrated on the image on the right side. As we can see, almost all of the run time is being spent transferring data between the host and the device. This is very often the case when one ports a code from CPU to GPU. We will look at how to optimize this in the [[OpenACC Tutorial - Data movement|next part of the tutorial]].


== The <tt>parallel loop</tt> directive ==
== The <tt>parallel loop</tt> directive ==
Bureaucrats, cc_docs_admin, cc_staff
2,879

edits