OpenACC Tutorial - Adding directives: Difference between revisions

OpenACC Tutorial - Adding directives (view source)

Revision as of 16:53, 9 May 2016

2,528 bytes added , 8 years ago

no edit summary

Mboisson

Bureaucrats, cc_docs_admin, cc_staff, rsnt_translations

2,837

edits

@@ Line 89: / Line 89: @@
 One example of this directive is the following code:
+</translate>
 <syntaxhighlight lang="cpp" line>
 #pragma acc kernels
@@ Line 99: / Line 101: @@
 </syntaxhighlight>
+<translate>
 This example is very simple. However, code is often not that simple, and we then need to reply on compiler feedback in order to identify regions it failed to parallelize.
 </translate>
@@ Line 112: / Line 115: @@
 <translate>
+=== Example: porting a matrix-vector product ===
+For this example, we use the code from the [https://github.com/calculquebec/cq-formation-openacc exercises repository]. More precisely, we will use a portion of the code from the <tt>matrix_functions.h</tt> file. The equivalent Fortran code can be found in the subroutine <tt>matvec</tt> contained in the <tt>matrix.F90</tt> file. The original code is the following:
+</translate>
+<syntaxhighlight lang="cpp" line>
+for(int i=0;i<num_rows;i++) {
+  double sum=0;
+  int row_start=row_offsets[i];
+  int row_end=row_offsets[i+1];
+  for(int j=row_start;j<row_end;j++) {
+    unsigned int Acol=cols[j];
+    double Acoef=Acoefs[j];
+    double xcoef=xcoefs[Acol];
+    sum+=Acoef*xcoef;
+  }
+  ycoefs[i]=sum;
+}
+</syntaxhighlight>
+<translate>
+The first change we make to this code to try to run it on the GPU is to add the <tt>kernels</tt> directive. At this stage, we don't worry about data transfer, or about giving more information to the compiler.
+</translate>
+<syntaxhighlight lang="cpp" line>
+#pragma acc kernels
+  {
+    for(int i=0;i<num_rows;i++) {
+      double sum=0;
+      int row_start=row_offsets[i];
+      int row_end=row_offsets[i+1];
+      for(int j=row_start;j<row_end;j++) {
+        unsigned int Acol=cols[j];
+        double Acoef=Acoefs[j];
+        double xcoef=xcoefs[Acol];
+        sum+=Acoef*xcoef;
+      }
+      ycoefs[i]=sum;
+    }
+  }
+</syntaxhighlight>
+<translate>
+==== Building with OpenACC ====
+For the purpose of this tutorial, we use version 16.3 of the PGI compilers. We use the option <tt>-ta</tt> (target accelerator) flag in order to enable offloading to accelerators.
+</translate>
+{{Callout
+|title=<translate>Which compiler ?</translate>
+|content=
+<translate>
+As of May 2016, compiler support for OpenACC is still relatively scarce. Being pushed by [http://www.nvidia.com/content/global/global.php NVidia], through its [http://www.pgroup.com/ Portland Group] division, as well as by [http://www.cray.com/ Cray], these two lines of compilers offer the most advanced OpenACC support. [https://gcc.gnu.org/wiki/OpenACC GNU Compiler] support for OpenACC exists, but is considered experimental in version 5. It is expected to be officially supported in version 6 of the compiler.
+For the purpose of this tutorial, we use version 16.3 of the Portland Group compilers. We note that [http://www.pgroup.com/support/download_pgi2016.php?view=current Portland Group compilers] are free for academic usage.
+</translate>
+}}
+<translate>
 [[OpenACC Tutorial|Back to the lesson plan]]
 </translate>