Bureaucrats, cc_docs_admin, cc_staff, rsnt_translations
2,837
edits
No edit summary |
No edit summary |
||
Line 89: | Line 89: | ||
One example of this directive is the following code: | One example of this directive is the following code: | ||
</translate> | |||
<syntaxhighlight lang="cpp" line> | <syntaxhighlight lang="cpp" line> | ||
#pragma acc kernels | #pragma acc kernels | ||
Line 99: | Line 101: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
<translate> | |||
This example is very simple. However, code is often not that simple, and we then need to reply on compiler feedback in order to identify regions it failed to parallelize. | This example is very simple. However, code is often not that simple, and we then need to reply on compiler feedback in order to identify regions it failed to parallelize. | ||
</translate> | </translate> | ||
Line 112: | Line 115: | ||
<translate> | <translate> | ||
=== Example: porting a matrix-vector product === | |||
For this example, we use the code from the [https://github.com/calculquebec/cq-formation-openacc exercises repository]. More precisely, we will use a portion of the code from the <tt>matrix_functions.h</tt> file. The equivalent Fortran code can be found in the subroutine <tt>matvec</tt> contained in the <tt>matrix.F90</tt> file. The original code is the following: | |||
</translate> | |||
<syntaxhighlight lang="cpp" line> | |||
for(int i=0;i<num_rows;i++) { | |||
double sum=0; | |||
int row_start=row_offsets[i]; | |||
int row_end=row_offsets[i+1]; | |||
for(int j=row_start;j<row_end;j++) { | |||
unsigned int Acol=cols[j]; | |||
double Acoef=Acoefs[j]; | |||
double xcoef=xcoefs[Acol]; | |||
sum+=Acoef*xcoef; | |||
} | |||
ycoefs[i]=sum; | |||
} | |||
</syntaxhighlight> | |||
<translate> | |||
The first change we make to this code to try to run it on the GPU is to add the <tt>kernels</tt> directive. At this stage, we don't worry about data transfer, or about giving more information to the compiler. | |||
</translate> | |||
<syntaxhighlight lang="cpp" line> | |||
#pragma acc kernels | |||
{ | |||
for(int i=0;i<num_rows;i++) { | |||
double sum=0; | |||
int row_start=row_offsets[i]; | |||
int row_end=row_offsets[i+1]; | |||
for(int j=row_start;j<row_end;j++) { | |||
unsigned int Acol=cols[j]; | |||
double Acoef=Acoefs[j]; | |||
double xcoef=xcoefs[Acol]; | |||
sum+=Acoef*xcoef; | |||
} | |||
ycoefs[i]=sum; | |||
} | |||
} | |||
</syntaxhighlight> | |||
<translate> | |||
==== Building with OpenACC ==== | |||
For the purpose of this tutorial, we use version 16.3 of the PGI compilers. We use the option <tt>-ta</tt> (target accelerator) flag in order to enable offloading to accelerators. | |||
</translate> | |||
{{Callout | |||
|title=<translate>Which compiler ?</translate> | |||
|content= | |||
<translate> | |||
As of May 2016, compiler support for OpenACC is still relatively scarce. Being pushed by [http://www.nvidia.com/content/global/global.php NVidia], through its [http://www.pgroup.com/ Portland Group] division, as well as by [http://www.cray.com/ Cray], these two lines of compilers offer the most advanced OpenACC support. [https://gcc.gnu.org/wiki/OpenACC GNU Compiler] support for OpenACC exists, but is considered experimental in version 5. It is expected to be officially supported in version 6 of the compiler. | |||
For the purpose of this tutorial, we use version 16.3 of the Portland Group compilers. We note that [http://www.pgroup.com/support/download_pgi2016.php?view=current Portland Group compilers] are free for academic usage. | |||
</translate> | |||
}} | |||
<translate> | |||
[[OpenACC Tutorial|Back to the lesson plan]] | [[OpenACC Tutorial|Back to the lesson plan]] | ||
</translate> | </translate> |