OpenACC Tutorial - Adding directives: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 89: Line 89:


One example of this directive is the following code:
One example of this directive is the following code:
</translate>
<syntaxhighlight lang="cpp" line>
<syntaxhighlight lang="cpp" line>
#pragma acc kernels
#pragma acc kernels
Line 99: Line 101:
</syntaxhighlight>  
</syntaxhighlight>  


<translate>
This example is very simple. However, code is often not that simple, and we then need to reply on compiler feedback in order to identify regions it failed to parallelize.  
This example is very simple. However, code is often not that simple, and we then need to reply on compiler feedback in order to identify regions it failed to parallelize.  
</translate>
</translate>
Line 112: Line 115:


<translate>
<translate>
=== Example: porting a matrix-vector product ===
For this example, we use the code from the [https://github.com/calculquebec/cq-formation-openacc exercises repository]. More precisely, we will use a portion of the code from the <tt>matrix_functions.h</tt> file. The equivalent Fortran code can be found in the subroutine <tt>matvec</tt> contained in the <tt>matrix.F90</tt> file. The original code is the following:
</translate>
<syntaxhighlight lang="cpp" line>
for(int i=0;i<num_rows;i++) {
  double sum=0;
  int row_start=row_offsets[i];
  int row_end=row_offsets[i+1];
  for(int j=row_start;j<row_end;j++) {
    unsigned int Acol=cols[j];
    double Acoef=Acoefs[j];
    double xcoef=xcoefs[Acol];
    sum+=Acoef*xcoef;
  }
  ycoefs[i]=sum;
}
</syntaxhighlight>
<translate>
The first change we make to this code to try to run it on the GPU is to add the <tt>kernels</tt> directive. At this stage, we don't worry about data transfer, or about giving more information to the compiler.
</translate>
<syntaxhighlight lang="cpp" line>
#pragma acc kernels
  {
    for(int i=0;i<num_rows;i++) {
      double sum=0;
      int row_start=row_offsets[i];
      int row_end=row_offsets[i+1];
      for(int j=row_start;j<row_end;j++) {
        unsigned int Acol=cols[j];
        double Acoef=Acoefs[j];
        double xcoef=xcoefs[Acol];
        sum+=Acoef*xcoef;
      }
      ycoefs[i]=sum;
    }
  }
</syntaxhighlight>
<translate>
==== Building with OpenACC ====
For the purpose of this tutorial, we use version 16.3 of the PGI compilers. We use the option <tt>-ta</tt> (target accelerator) flag in order to enable offloading to accelerators.
</translate>
{{Callout
|title=<translate>Which compiler ?</translate>
|content=
<translate>
As of May 2016, compiler support for OpenACC is still relatively scarce. Being pushed by [http://www.nvidia.com/content/global/global.php NVidia], through its [http://www.pgroup.com/ Portland Group] division, as well as by [http://www.cray.com/ Cray], these two lines of compilers offer the most advanced OpenACC support. [https://gcc.gnu.org/wiki/OpenACC GNU Compiler] support for OpenACC exists, but is considered experimental in version 5. It is expected to be officially supported in version 6 of the compiler.
For the purpose of this tutorial, we use version 16.3 of the Portland Group compilers. We note that [http://www.pgroup.com/support/download_pgi2016.php?view=current Portland Group compilers] are free for academic usage.
</translate>
}}
<translate>
[[OpenACC Tutorial|Back to the lesson plan]]
[[OpenACC Tutorial|Back to the lesson plan]]
</translate>
</translate>
Bureaucrats, cc_docs_admin, cc_staff, rsnt_translations
2,837

edits