OpenACC Tutorial - Adding directives: Difference between revisions

Jump to navigation Jump to search
m
No edit summary
Line 108: Line 108:
|content=
|content=
<translate>
<translate>
Those who have used [[OpenMP]] before will be familiar with the directive based nature of OpenACC. There is however one major difference between OpenMP and OpenACC directives. OpenMP directives are by design ''prescriptive'' in nature. This means that the compiler is required to perform the requested parallelization, no matter whether this is good from a performance stand point or not. This yields very reproductible results from one compiler to the next. This also means that parallelization will be performed the same way, whatever the hardware the code runs on. However, not every architecture performs best with code written the same way. Sometimes, it may be beneficial to switch the order of loops for example. If one were to parallelize a code with OpenMP and wanted it to perform optimally on multiple different architectures, they would have to write different sets of directives for different architectures.  
Those who have used [[OpenMP]] before will be familiar with the directive based nature of OpenACC. There is however one major difference between OpenMP and OpenACC directives. OpenMP directives are by design ''prescriptive'' in nature. This means that the compiler is required to perform the requested parallelization, no matter whether this is good from a performance stand point or not. This yields very reproducible results from one compiler to the next. This also means that parallelization will be performed the same way, whatever the hardware the code runs on. However, not every architecture performs best with code written the same way. Sometimes, it may be beneficial to switch the order of loops for example. If one were to parallelize a code with OpenMP and wanted it to perform optimally on multiple different architectures, they would have to write different sets of directives for different architectures.  


By opposition, many of OpenACC's directives are ''descriptive'' in nature. This means that the compiler is free to compile the code whichever way it thinks is best for the target architecture. This may even imply that the code is not parallelized at all. The '''same code''', compiled to run on GPU, or on Xeon Phi, or on CPU, may therefore yield different binary code. This, of course, means that different compilers may yield different performance. It also means that new generations of compilers will do better than previous generations, especially with new hardware.
In contrast, many of OpenACC's directives are ''descriptive'' in nature. This means that the compiler is free to compile the code whichever way it thinks is best for the target architecture. This may even imply that the code is not parallelized at all. The '''same code''', compiled to run on GPU, or on Xeon Phi, or on CPU, may therefore yield different binary code. This, of course, means that different compilers may yield different performance. It also means that new generations of compilers will do better than previous generations, especially with new hardware.
</translate>
</translate>
}}
}}
Line 189: Line 189:


<translate>
<translate>
== Fixing false loop dependencies ==
== Fixing false loop dependencies ==
Sometimes, the compiler believes that loops cannot be parallelized despite being obvious to the programmer. One common case, in C and C++, is what is called ''[https://en.wikipedia.org/wiki/Pointer_aliasing pointer aliasing]''. Contrary to Fortran arrays, C and C++ do not formally have arrays. They have what is called pointers. Two pointers are said to be ''aliased'' if they point to the same memory. If the compiler does not know that pointers are not aliased, it must assume that they are. Going back to the previous example, it becomes obvious why the compiler could not parallelize the loop. If we assume that each pointer is the same, then there is an obvious dependence between loop iterations.  
Sometimes, the compiler believes that loops cannot be parallelized despite being obvious to the programmer. One common case, in C and C++, is what is called ''[https://en.wikipedia.org/wiki/Pointer_aliasing pointer aliasing]''. Contrary to Fortran arrays, C and C++ do not formally have arrays. They have what is called pointers. Two pointers are said to be ''aliased'' if they point to the same memory. If the compiler does not know that pointers are not aliased, it must assume that they are. Going back to the previous example, it becomes obvious why the compiler could not parallelize the loop. If we assume that each pointer is the same, then there is an obvious dependence between loop iterations.  
Bureaucrats, cc_docs_admin, cc_staff
2,879

edits

Navigation menu