Bureaucrats, cc_docs_admin, cc_staff, rsnt_translations
2,837
edits
No edit summary |
No edit summary |
||
Line 82: | Line 82: | ||
<translate> | <translate> | ||
== The <tt>kernels</tt> directive == | == The <tt>kernels</tt> directive == | ||
The <tt>kernels</tt> directive is what we call a ''descriptive'' directive. It is used to tell the compiler that the programmer thinks this region can be made parallel. At this point, the compiler is free to do whatever it wants with this information. Typically, it will | The <tt>kernels</tt> directive is what we call a ''descriptive'' directive. It is used to tell the compiler that the programmer thinks this region can be made parallel. At this point, the compiler is free to do whatever it wants with this information. It can use whichever strategy it thinks is best to run the code, ''including'' running it sequentially. Typically, it will | ||
# Analyze the code to try to identify parallelism | # Analyze the code to try to identify parallelism | ||
# If found, identify which data must be transferred and when | # If found, identify which data must be transferred and when | ||
Line 99: | Line 99: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
This example is very simple. However, code is often not that simple, and we then need to reply on compiler feedback in order to identify regions it failed to parallelize. | |||
</translate> | |||
{{Callout | |||
|title=<translate>Descriptive vs prescriptive</translate> | |||
|content= | |||
<translate> | |||
Those who have used [[OpenMP]] before will be familiar with the directive based nature of OpenACC. There is however one major difference between OpenMP and OpenACC directives. OpenMP directives are by design ''prescriptive'' in nature. This means that the compiler is required to perform the requested parallelization, no matter whether this is good from a performance stand point or not. This yields very reproductible results from one compiler to the next. This also means that parallelization will be performed the same way, whatever the hardware the code runs on. However, not every architecture performs best with code written the same way. Sometimes, it may be beneficial to switch the order of loops for example. If one were to parallelize a code with OpenMP and wanted it to perform optimally on multiple different architectures, they would have to write different sets of directives for different architectures. | |||
By opposition, many of OpenACC's directives are ''descriptive'' in nature. This means that the compiler is free to compile the code whichever way it thinks is best for the target architecture. This may even imply that the code is not parallelized at all. The '''same code''', compiled to run on GPU, or on Xeon Phi, or on CPU, may therefore yield different binary code. This, of course, means that different compilers may yield different performance. It also means that new generations of compilers will do better than previous generations, especially with new hardware. | |||
</translate> | |||
}} | |||
<translate> | |||
[[OpenACC Tutorial|Back to the lesson plan]] | [[OpenACC Tutorial|Back to the lesson plan]] | ||
</translate> | </translate> |