OpenACC Tutorial - Adding directives: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 175: Line 175:


{{Command
{{Command
|pgc++ -fast -Minfo{{=}}accel -ta{{=}}tesla:managed main.cpp -o challenge
|nvc++ -fast -Minfo{{=}}accel -ta{{=}}tesla:managed main.cpp -o challenge
|result=
|result=
...
...
matvec(const matrix &, const vector &, const vector &):
matvec(const matrix &, const vector &, const vector &):
     23, include "matrix_functions.h"
     23, include "matrix_functions.h"
           27, Generating copyout(ycoefs[:num_rows])
           27, Generating implicit copyin(xcoefs[:]) [if not already present]
               Generating copyin(xcoefs[:],Acoefs[:],cols[:],row_offsets[:num_rows+1])
              Generating implicit copyout(ycoefs[:num_rows]) [if not already present]
           29, Complex loop carried dependence of row_offsets-> prevents parallelization
               Generating implicit copyin(row_offsets[:num_rows+1],Acoefs[:],cols[:]) [if not already present]
              Loop carried dependence of ycoefs-> prevents parallelization
           30, Loop carried dependence of ycoefs-> prevents parallelization
               Loop carried backward dependence of ycoefs-> prevents vectorization
               Loop carried backward dependence of ycoefs-> prevents vectorization
               Complex loop carried dependence of cols->,Acoefs->,xcoefs-> prevents parallelization
               Complex loop carried dependence of Acoefs->,xcoefs-> prevents parallelization
              Accelerator kernel generated
               Generating Tesla code
               Generating Tesla code
               33, #pragma acc loop vector(128) /* threadIdx.x */
               30, #pragma acc loop seq
              37, Sum reduction generated for sum
              34, #pragma acc loop vector(128) /* threadIdx.x */
           33, Loop is parallelizable
                  Generating implicit reduction(+:sum)
           34, Loop is parallelizable
}}
}}
As we can see in the compiler output, the compiler could not parallelize the two loops. We will see in the following sections how to deal with this.  
As we can see in the compiler output, the compiler could not parallelize the two loops. We will see in the following sections how to deal with this.  
Bureaucrats, cc_docs_admin, cc_staff, rsnt_translations
2,837

edits