OpenACC Tutorial - Adding directives/fr: Difference between revisions

OpenACC Tutorial - Adding directives/fr (view source)

Revision as of 15:44, 25 May 2021

241 bytes added , 3 years ago

Updating to match new version of source page

FuzzyBot

Bots

38,760

edits

@@ Line 143: / Line 143: @@
 {{Command
-|pgc++ -fast -Minfo{{=}}accel -ta{{=}}tesla:managed main.cpp -o challenge
+|nvc++ -fast -Minfo{{=}}accel -ta{{=}}tesla:managed main.cpp -o challenge
 |result=
 ...
 matvec(const matrix &, const vector &, const vector &):
 , include "matrix_functions.h"
-, Generating copyout(ycoefs[:num_rows])
+, Generating implicit copyin(xcoefs[:]) [if not already present]
-               Generating copyin(xcoefs[:],Acoefs[:],cols[:],row_offsets[:num_rows+1])
+              Generating implicit copyout(ycoefs[:num_rows]) [if not already present]
-, Complex loop carried dependence of row_offsets-> prevents parallelization
+               Generating implicit copyin(row_offsets[:num_rows+1],Acoefs[:],cols[:]) [if not already present]
-              Loop carried dependence of ycoefs-> prevents parallelization
+, Loop carried dependence of ycoefs-> prevents parallelization
                Loop carried backward dependence of ycoefs-> prevents vectorization
-               Complex loop carried dependence of cols->,Acoefs->,xcoefs-> prevents parallelization
+               Complex loop carried dependence of Acoefs->,xcoefs-> prevents parallelization
-              Accelerator kernel generated
                Generating Tesla code
-, #pragma acc loop vector(128) /* threadIdx.x */
+, #pragma acc loop seq
-, Sum reduction generated for sum
+, #pragma acc loop vector(128) /* threadIdx.x */
-, Loop is parallelizable
+                  Generating implicit reduction(+:sum)
+, Loop is parallelizable
 }}
 As we can see in the compiler output, the compiler could not parallelize the two loops. We will see in the following sections how to deal with this.
@@ Line 164: / Line 164: @@
 |title=Choix du compilateur
 |content=
-En date de mai 2016, relativement peu de compilateurs offraient les fonctionnalités d'OpenACC. Les plus avancés en ce sens sont les compilateurs du [http://www.pgroup.com/ Portland Group] de [http://www.nvidia.com/content/global/global.php NVidia] et ceux de [http://www.cray.com/ Cray]. Pour ce qui de [https://gcc.gnu.org/wiki/OpenACC GNU], l'implémentation d'OpenACC était expérimentale et devrait être complète dans la version 6.
+<div class="mw-translate-fuzzy">
+En date de mai 2016, relativement peu de compilateurs offraient les fonctionnalités d'OpenACC. Les plus avancés en ce sens sont les compilateurs du [http://www.pgroup.com/ Portland Group] de [http://www.nvidia.com/content/global/global.php NVidia] et ceux de [http://www.cray.com/ Cray]. Pour ce qui de [https://gcc.gnu.org/wiki/OpenACC GNU], l'implémentation d'OpenACC était expérimentale et devrait être complète dans la version 6.
+</div>
-Dans ce tutoriel, nous utilisons la version 16.3 des [http://www.pgroup.com/support/download_pgi2016.php?view=current compilateurs du Portland Group]  qui sont gratuits pour des fins de recherche universitaire.
+<div class="mw-translate-fuzzy">
+Dans ce tutoriel, nous utilisons la version 16.3 des [http://www.pgroup.com/support/download_pgi2016.php?view=current compilateurs du Portland Group]  qui sont gratuits pour des fins de recherche universitaire.
+</div>
 }}
@@ Line 211: / Line 215: @@
 Remarquez que les autres pointeurs n'ont pas besoin d'être restreints puisque le compilateur ne les rapporte pas comme causant des problèmes. En recompilant avec les changements que nous venons de faire, le compilateur émet le message suivant&nbsp;:
 {{Command
-|pgc++ -fast -Minfo{{=}}accel -ta{{=}}tesla:managed main.cpp -o challenge
+|nvc++ -fast -Minfo{{=}}accel -ta{{=}}tesla:managed main.cpp -o challenge
 |result=
 matvec(const matrix &, const vector &, const vector &):
 , include "matrix_functions.h"
-, Generating copyout(ycoefs[:num_rows])
+, Generating implicit copyout(ycoefs[:num_rows]) [if not already present]
-               Generating copyin(xcoefs[:],Acoefs[:],cols[:],row_offsets[:num_rows+1])
+               Generating implicit copyin(xcoefs[:],row_offsets[:num_rows+1],Acoefs[:],cols[:]) [if not already present]
 , Loop is parallelizable
-              Accelerator kernel generated
                Generating Tesla code
-, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
+, #pragma acc loop gang /* blockIdx.x */
-, Loop is parallelizable
+, #pragma acc loop vector(128) /* threadIdx.x */
+                  Generating implicit reduction(+:sum)
+, Loop is parallelizable
 }}