OpenACC Tutorial - Adding directives: Difference between revisions

OpenACC Tutorial - Adding directives (view source)

Revision as of 14:21, 17 June 2021

115 bytes removed , 3 years ago

no edit summary

Rdickson

Bureaucrats, cc_docs_admin, cc_staff

2,879

edits

@@ Line 170: / Line 170: @@
 <translate>
-==== Building with OpenACC ==== <!--T:21-->
+==== Building with OpenACC ====
-For the purpose of this tutorial, we use version 16.3 of the PGI compilers. We use the <tt>-ta</tt> (target accelerator) option in order to enable offloading to accelerators. With this option, we use the sub option <tt>tesla:managed</tt>, to tell the compiler that we want it compiled for Tesla GPUs, and we want to use managed memory. Managed memory simplifies the process of transferring data to and from the device. We will remove this option in a later example. We also use the option <tt>-fast</tt>, which is an optimization option.
+{{Callout
+|title=<!--T:22-->
+Which compiler ?</translate>
+|content=
+<translate>
+<!--T:23-->
+As of May 2021, compiler support for OpenACC is present in many compilers. Being pushed by [http://www.nvidia.com/content/global/global.php NVidia], as well as by [http://www.cray.com/ Cray], these two lines of compilers offer the most advanced OpenACC support. [https://gcc.gnu.org/wiki/OpenACC GNU Compiler] support for OpenACC exists with better support every version since version 5.
+<!--T:24-->
+For the purpose of this tutorial, we use version 20.7 of the NVidia HPC compilers.
+}}
+The NVidia compilers use the <tt>-ta</tt> (target accelerator) option to enable compilation for an accelerator. We use the sub-option <tt>tesla:managed</tt>, to tell the compiler that we want it compiled for Tesla GPUs, and we want to use managed memory. Managed memory simplifies the process of transferring data to and from the device. We will remove this option in a later example. We also use the option <tt>-fast</tt>, which is an optimization option.
 </translate>
@@ Line 191: / Line 204: @@
                    Generating implicit reduction(+:sum)
 , Loop is parallelizable
-}}
-As we can see in the compiler output, the compiler could not parallelize the two loops. We will see in the following sections how to deal with this.
-{{Callout
-|title=<translate><!--T:22-->
-Which compiler ?</translate>
-|content=
-<translate>
-<!--T:23-->
-As of May 2021, compiler support for OpenACC is present in many compilers. Being pushed by [http://www.nvidia.com/content/global/global.php NVidia], as well as by [http://www.cray.com/ Cray], these two lines of compilers offer the most advanced OpenACC support. [https://gcc.gnu.org/wiki/OpenACC GNU Compiler] support for OpenACC exists with better support every version since version 5.
-<!--T:24-->
-For the purpose of this tutorial, we use version 20.7 of the NVidia HPC compilers.
-</translate>
 }}
 <translate>
+As we can see in the compiler output, the compiler could not parallelize the two loops. We will see in the following sections how to deal with this.
 == Fixing false loop dependencies == <!--T:25-->