OpenACC Tutorial - Adding directives/fr: Difference between revisions

Created page with "Les résultats sont corrects, toutefois, loin de gagner en vitesse, l'opération a pris près de quatre fois plus de temps! Utilisons le NVIDIA Visual Profiler (<tt>nvvp</tt>..."
(Created page with "Cliquez pour agrandir.")
(Created page with "Les résultats sont corrects, toutefois, loin de gagner en vitesse, l'opération a pris près de quatre fois plus de temps! Utilisons le NVIDIA Visual Profiler (<tt>nvvp</tt>...")
Line 260: Line 260:
}}
}}
[[File:Openacc profiling1.png|thumbnail|Cliquez pour agrandir.]]
[[File:Openacc profiling1.png|thumbnail|Cliquez pour agrandir.]]
The results are correct. However, not only do we not get any speed up, but we rather get a slow down by a factor of almost 4! Let's profile the code again using NVidia's visual profiler (<tt>nvvp</tt>). This can be done with the following steps:
Les résultats sont corrects,  toutefois, loin de gagner en vitesse, l'opération a pris près de quatre fois plus de temps! Utilisons le NVIDIA Visual Profiler (<tt>nvvp</tt>) pour voir ce qui se passe.  
# Start <tt>nvvp</tt> with the command <tt>nvvp &</tt>   (the <tt>&</tt> sign is to start it in the background)
# Démarrez  <tt>nvvp</tt> avec la commande <tt>nvvp &</tt> , où le symbole <tt>&</tt> permet de démarrer en arrière-plan.
# Go in File -> New Session
# Sélectionnez ''File -> New Session''.
# In the "File:" field, search for the executable (named <tt>challenge</tt> in our example).
# Dans le champ "File:", cherchez l'exécutable; dans notre exemple, nous utilisons  <tt>challenge</tt> .
# Click "Next" until you can click "Finish".  
# Cliquez sur "Next" jusqu'à ce que vous puissiez cliquer sur "Finish".  


This will run the program and generate a timeline of the execution. The resulting timeline is illustrated on the image on the right side. As we can see, almost all of the run time is being spent transferring data between the host and the device. This is very often the case when one ports a code from CPU to GPU. We will look at how to optimize this in the [[OpenACC Tutorial - Data movement|next part of the tutorial]].
This will run the program and generate a timeline of the execution. The resulting timeline is illustrated on the image on the right side. As we can see, almost all of the run time is being spent transferring data between the host and the device. This is very often the case when one ports a code from CPU to GPU. We will look at how to optimize this in the [[OpenACC Tutorial - Data movement|next part of the tutorial]].
rsnt_translations
56,430

edits