OpenACC Tutorial - Adding directives: Difference between revisions

Jump to navigation Jump to search
Reviewed Building with OpenACC
(Partly reviewed the kernels directive section)
(Reviewed Building with OpenACC)
Line 159: Line 159:
=== Example: porting a matrix-vector product === <!--T:19-->
=== Example: porting a matrix-vector product === <!--T:19-->
For this example, we use the code from the [https://github.com/calculquebec/cq-formation-openacc exercises repository].
For this example, we use the code from the [https://github.com/calculquebec/cq-formation-openacc exercises repository].
More precisely, we will use a portion of the code from the [https://github.com/calculquebec/cq-formation-openacc/blob/main/cpp/matrix_functions.h#L16 <code>cpp/matrix_functions.h</code> file].
More precisely, we will use a portion of the code from the [https://github.com/calculquebec/cq-formation-openacc/blob/main/cpp/matrix_functions.h#L20 <code>cpp/matrix_functions.h</code> file].
The equivalent Fortran code can be found in the subroutine [https://github.com/calculquebec/cq-formation-openacc/blob/main/f90/matrix.F90#L101 <code>matvec</code> contained in the <code>matrix.F90</code> file].
The equivalent Fortran code can be found in the subroutine [https://github.com/calculquebec/cq-formation-openacc/blob/main/f90/matrix.F90#L101 <code>matvec</code> contained in the <code>matrix.F90</code> file].
The C++ code is the following:
The C++ code is the following:
Line 180: Line 180:
<translate>
<translate>
<!--T:20-->
<!--T:20-->
The [https://github.com/calculquebec/cq-formation-openacc/blob/main/cpp/step1.kernels/matrix_functions.h#L29 first change] we make to this code to try to run it on the GPU is to add the <code>kernels</code> directive.
The [https://github.com/calculquebec/cq-formation-openacc/blob/main/cpp/step1.kernels/matrix_functions.h#L29 first change] we make to this code in order to try to run it on the GPU is to add the <code>kernels</code> directive.
At this stage, we don't worry about data transfer, or about giving more information to the compiler.
At this stage, we don't worry about data transfer, or about giving more information to the compiler.
</translate>
</translate>
Line 204: Line 204:
<translate>
<translate>
==== Building with OpenACC ==== <!--T:49-->
==== Building with OpenACC ==== <!--T:49-->
<!--T:22-->
{{Callout
|title=
Which compiler ?</translate>
|content=
<translate>
<!--T:23-->
As of May 2021, compiler support for OpenACC is present in many compilers. Being pushed by [http://www.nvidia.com/content/global/global.php NVidia], as well as by [http://www.cray.com/ Cray], these two lines of compilers offer the most advanced OpenACC support. [https://gcc.gnu.org/wiki/OpenACC GNU Compiler] support for OpenACC exists with better support every version since version 5.
<!--T:24-->
For the purpose of this tutorial, we use version 20.7 of the NVidia HPC compilers.
}}


<!--T:50-->
<!--T:50-->
The NVidia compilers use the <tt>-ta</tt> (target accelerator) option to enable compilation for an accelerator. We use the sub-option <tt>tesla:managed</tt>, to tell the compiler that we want it compiled for Tesla GPUs, and we want to use managed memory. Managed memory simplifies the process of transferring data to and from the device. We will remove this option in a later example. We also use the option <tt>-fast</tt>, which is an optimization option.
The NVidia compilers use the <code>-ta</code> (target accelerator) option to enable compilation for an accelerator.
We use the sub-option <code>tesla:managed</code> to tell the compiler that we want it compiled for Tesla GPUs,
and we want to use [https://developer.nvidia.com/blog/unified-memory-cuda-beginners/ managed memory].
This ''managed memory'' simplifies the process of transferring data to and from the device.
We will remove this option in a later example.
We also use the option <code>-fast</code>, which is an optimization option.
</translate>
</translate>


Line 228: Line 220:
matvec(const matrix &, const vector &, const vector &):
matvec(const matrix &, const vector &, const vector &):
     23, include "matrix_functions.h"
     23, include "matrix_functions.h"
           27, Generating implicit copyin(xcoefs[:]) [if not already present]
           30, Generating implicit copyin(cols[:],row_offsets[:num_rows+1],Acoefs[:]) [if not already present]
               Generating implicit copyout(ycoefs[:num_rows]) [if not already present]
               Generating implicit copyout(ycoefs[:num_rows]) [if not already present]
               Generating implicit copyin(row_offsets[:num_rows+1],Acoefs[:],cols[:]) [if not already present]
               Generating implicit copyin(xcoefs[:]) [if not already present]
           30, Loop carried dependence of ycoefs-> prevents parallelization
           31, Loop carried dependence of ycoefs-> prevents parallelization
               Loop carried backward dependence of ycoefs-> prevents vectorization
               Loop carried backward dependence of ycoefs-> prevents vectorization
               Complex loop carried dependence of Acoefs->,xcoefs-> prevents parallelization
               Complex loop carried dependence of Acoefs->,xcoefs-> prevents parallelization
               Generating Tesla code
               Generating NVIDIA GPU code
               30, #pragma acc loop seq
               31, #pragma acc loop seq
               34, #pragma acc loop vector(128) /* threadIdx.x */
               35, #pragma acc loop vector(128) /* threadIdx.x */
                   Generating implicit reduction(+:sum)
                   Generating implicit reduction(+:sum)
           34, Loop is parallelizable
           35, Loop is parallelizable
}}
}}


<translate>
<translate>
<!--T:51-->
<!--T:51-->
As we can see in the compiler output, the compiler could not parallelize the two loops. We will see in the following sections how to deal with this.
As we can see in the compiler output, the compiler could not parallelize the outer loop on line 31.
We will see in the following sections how to deal with those dependencies.


== Fixing false loop dependencies == <!--T:25-->
== Fixing false loop dependencies == <!--T:25-->
cc_staff
782

edits

Navigation menu