Bureaucrats, cc_docs_admin, cc_staff, rsnt_translations
2,837
edits
No edit summary |
No edit summary |
||
Line 156: | Line 156: | ||
<translate> | <translate> | ||
==== Building with OpenACC ==== | ==== Building with OpenACC ==== | ||
For the purpose of this tutorial, we use version 16.3 of the PGI compilers. We use the option <tt>-ta</tt> (target accelerator) flag in order to enable offloading to accelerators. | For the purpose of this tutorial, we use version 16.3 of the PGI compilers. We use the option <tt>-ta</tt> (target accelerator) flag in order to enable offloading to accelerators. With this option, we use the sub option <tt>tesla:managed</tt>, to tell the compiler that we want it compiled for Tesla GPUs, and we want to use managed memory. Managed memory simplifies the process of transferring data to and from the device. We will remove this option in a latter example. We also use the option <tt>-fast</tt>, which is an optimization option. | ||
</translate> | </translate> | ||
{{Command | |||
|pgc++ -fast -Minfo{{=}}accel -ta{{=}}tesla:managed main.cpp -o challenge | |||
|result= | |||
... | |||
matvec(const matrix &, const vector &, const vector &): | |||
23, include "matrix_functions.h" | |||
27, Generating copyout(ycoefs[:num_rows]) | |||
Generating copyin(xcoefs[:],Acoefs[:],cols[:],row_offsets[:num_rows+1]) | |||
29, Complex loop carried dependence of row_offsets-> prevents parallelization | |||
Loop carried dependence of ycoefs-> prevents parallelization | |||
Loop carried backward dependence of ycoefs-> prevents vectorization | |||
Complex loop carried dependence of cols->,Acoefs->,xcoefs-> prevents parallelization | |||
Accelerator kernel generated | |||
Generating Tesla code | |||
33, #pragma acc loop vector(128) /* threadIdx.x */ | |||
37, Sum reduction generated for sum | |||
33, Loop is parallelizable | |||
}} | |||
As we can see in the compiler output, the compiler could not parallelize the two loops. We will see in the following sections how to deal with this. | |||
{{Callout | {{Callout | ||
|title=<translate>Which compiler ?</translate> | |title=<translate>Which compiler ?</translate> |