Bureaucrats, cc_docs_admin, cc_staff, rsnt_translations
2,837
edits
No edit summary |
No edit summary |
||
Line 175: | Line 175: | ||
{{Command | {{Command | ||
| | |nvc++ -fast -Minfo{{=}}accel -ta{{=}}tesla:managed main.cpp -o challenge | ||
|result= | |result= | ||
... | ... | ||
matvec(const matrix &, const vector &, const vector &): | matvec(const matrix &, const vector &, const vector &): | ||
23, include "matrix_functions.h" | 23, include "matrix_functions.h" | ||
27, Generating copyout(ycoefs[:num_rows]) | 27, Generating implicit copyin(xcoefs[:]) [if not already present] | ||
Generating copyin( | Generating implicit copyout(ycoefs[:num_rows]) [if not already present] | ||
Generating implicit copyin(row_offsets[:num_rows+1],Acoefs[:],cols[:]) [if not already present] | |||
30, Loop carried dependence of ycoefs-> prevents parallelization | |||
Loop carried backward dependence of ycoefs-> prevents vectorization | Loop carried backward dependence of ycoefs-> prevents vectorization | ||
Complex loop carried dependence of | Complex loop carried dependence of Acoefs->,xcoefs-> prevents parallelization | ||
Generating Tesla code | Generating Tesla code | ||
30, #pragma acc loop seq | |||
34, #pragma acc loop vector(128) /* threadIdx.x */ | |||
Generating implicit reduction(+:sum) | |||
34, Loop is parallelizable | |||
}} | }} | ||
As we can see in the compiler output, the compiler could not parallelize the two loops. We will see in the following sections how to deal with this. | As we can see in the compiler output, the compiler could not parallelize the two loops. We will see in the following sections how to deal with this. |