OpenACC Tutorial - Adding directives: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 190: Line 190:
<translate>
<translate>
== Fixing false loop dependencies ==
== Fixing false loop dependencies ==
Sometimes, the compiler believes that loops cannot be parallelized despite being obvious to the programmer. One common case, in C and C++, is what is called ''[https://en.wikipedia.org/wiki/Pointer_aliasing pointer aliasing]''. Contrary to Fortran arrays, C and C++ do not formally have arrays. They have what is called pointers. Two pointers are said to be ''aliased'' if they point to the same memory. If the compiler does not know that pointers are not aliased, it must assume that they are aliased. Going back to the previous example, it becomes obvious why the compiler could not parallelize the loop. If we assume that each pointer is the same, then there is an obvious dependence between loop iterations.  
Sometimes, the compiler believes that loops cannot be parallelized despite being obvious to the programmer. One common case, in C and C++, is what is called ''[https://en.wikipedia.org/wiki/Pointer_aliasing pointer aliasing]''. Contrary to Fortran arrays, C and C++ do not formally have arrays. They have what is called pointers. Two pointers are said to be ''aliased'' if they point to the same memory. If the compiler does not know that pointers are not aliased, it must assume that they are. Going back to the previous example, it becomes obvious why the compiler could not parallelize the loop. If we assume that each pointer is the same, then there is an obvious dependence between loop iterations.  


=== <tt>restrict</tt> keyword ===
=== <tt>restrict</tt> keyword ===
The way to tell the compiler that pointers are '''not''' going to be aliased, is by using a special keyword. In C, the keyword <tt>restrict</tt> was introduced in C99.  In C++, there is no standard way, but each compiler typically has its own keyword. Either <tt>__restrict</tt> or <tt>__restrict__</tt> can be used depending on the compiler. For Portland Group compilers, the keyword is <tt>__restrict</tt>.
One way to tell the compiler that pointers are '''not''' going to be aliased, is by using a special keyword. In C, the keyword <tt>restrict</tt> was introduced in C99 for this purpose.  In C++, there is no standard way yet, but each compiler typically has its own keyword. Either <tt>__restrict</tt> or <tt>__restrict__</tt> can be used depending on the compiler. For Portland Group compilers, the keyword is <tt>__restrict</tt>. For an explanation as to why there is no standard way to do this in C++, you can read [http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3988.pdf this paper]. This concept is important not only for OpenACC, but for any C/C++ programming, since many more optimizations can be done when pointers are guaranteed not to be aliased.
</translate>


{{Callout
|title=<translate>What does <tt>restrict</tt> really means ?</translate>
|content=
<translate>
Declaring a pointer as restricted formally means that for "the lifetime of the pointer, only it or a value derived from it (such as <tt>ptr +1</tt>) will be used to access the object to which it points". This is a guarantee that the ''programmer'' gives to the ''compiler''. If the programmer violates this guarantee, behaviour is undefined. For more information on this concept, see this [https://en.wikipedia.org/wiki/Restrict Wikipedia article].
</translate>
}}


For more information on this concept, see this [https://en.wikipedia.org/wiki/Restrict Wikipedia article].
<translate>
=== Loop directive with independent clause ===
Another way to tell the compiler that loops iterations are independent is to specify it explicitly by using a different directive: <tt>loop</tt>, with the clause <tt>independent</tt>. This is a ''prescriptive'' directive. Like any prescriptive directive, this tells the compiler what to do, and overrides any compiler analysis. The initial example above would become:
<syntaxhighlight lang="cpp" line>
#pragma acc kernels
{
#pragma acc loop independent
for (int i=0; i<N; i++)
{
  C[i] = A[i] + B[i];
}
}
</syntaxhighlight>


[[OpenACC Tutorial|Back to the lesson plan]]
[[OpenACC Tutorial|Back to the lesson plan]]
</translate>
</translate>
Bureaucrats, cc_docs_admin, cc_staff, rsnt_translations
2,837

edits

Navigation menu