OpenMP: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
Line 1: Line 1:
== Description ==
== Description ==
[http://openmp.org/wp/ OpenMP] (Open Multi-Processing) is a programming interface for shared memory parallel computing. This API is supported on numerous platfroms, including Unix and Windows, for the C/C++ and Fortran programming languages. This API consists of a set of directives, a software library, and environment variables.
[http://openmp.org/wp/ OpenMP] (Open Multi-Processing) is an application programming interface for shared memory parallel computing. This API is supported on numerous platforms, including Linux and Windows, and is available for the C/C++ and Fortran programming languages. The API consists of a set of directives, a software library, and environment variables.


OpenMP allows for the rapid development of fine-grained parallel applications while staying close to the serial code. There is only one program instance, executed in parallel on multiple processors. Directives inserted into the program allow for the management of the computations' distribution between processors.  
OpenMP allows for the rapid development of fine-grained parallel applications on a multicore machine while staying close to the serial code. Although there is only one program instance running, it can execute multiple subtasks in parallel. Directives inserted into the program control the parallel execution and allow for the management of the computations' distribution between the subtasks. The beauty of these directives is that they are usually non-intrusive and the compiler that does not support them can still compile the program and the user can run it, serially of course, as usual.


The OpenMP interface uses the notion of threads, well-known within object oriented programming. A thread is a bit like a "virtual processor, operating serially". From the programmer's point of view, if there are five threads, then that corresponds virtually to five processors that can do a computation in parallel. It is important to understand that the number of threads is independent of the number of physical processors within the computer. Two processors can, for example, run a program with 10 threads. The operating system decides how to share the processors' time between threads.  
The OpenMP interface relies on the notion of threads, a well-known concept in the Operating Systems realm. A thread is a bit like a light process or a "virtual processor, operating serially", and can formally be defined as the smallest unit of work/processing that can be scheduled by an Operating System. From a programmer's point of view, if there are five threads, then that corresponds virtually to five cores that can do a computation in parallel. It is important to understand that the number of threads is independent of the number of physical cores within the computer. Two cores can, for example, run a program with ten threads. The operating system decides how to share the cores' time between threads.  


Having said that, it is clear that if you have four available processors, you should use at least four threads to be able to profit from all the available computing power, as a thread can not be executed by two processors at the same time. It could be advantageous, in certain cases, to use more threads than the number of available processors. Using too many threads is not recommended, however.
Having said that, it is clear that if you have four available cores, you should use at least four threads to be able to profit from all the available computing power, as a thread can not be executed by two processors at the same time. It could be advantageous, in certain cases, to use more threads than the number of available cores. Using too many threads is not recommended, however.


Another important point concerning threads is synchronization. When multiple threads with the same program do computations at the same time, absolutely nothing can be assumed about the order in which things happen. The exact thread distribution method, between processors, remains completely unknown by the programmer.
Another important point concerning threads is synchronization. When multiple threads within the same program do computations at the same time, absolutely nothing can be assumed about the order in which things happen. If the order matters for the correctness of the code, then the necessary openMP synchronization directives need to used to achieve that. The exact thread distribution method, between cores, remains completely unknown to the programmer (openMP and thread affinity offer capabilities to help the programmer control that as well).  


The following link points to a [http://www.admin-magazine.com/HPC/Articles/Programming-with-OpenMP tutorial for getting started with OpenMP under Linux].
The following link points to a [http://www.admin-magazine.com/HPC/Articles/Programming-with-OpenMP tutorial for getting started with OpenMP under Linux].


== Compilation ==
== Compilation ==
Compiling an OpenMP program is done by simply adding a command-line option for the majority of compiler. For the GNU compilers ([[GCC/en|GCC]]), this is <tt>-fopenmp</tt>, but for [[Intel/en|Intel]] it is <tt>-openmp</tt>. For other compilers, please refer to their documentation.  
For the majority of compilers, compiling an OpenMP program is done by simply adding a command-line option to the compilation flags. For the GNU compilers ([[GCC/en|GCC]]), it is <tt>-fopenmp</tt>, but for [[Intel/en|Intel]] it is <tt>-openmp</tt>. For other compilers, please refer to their documentation.  


== Directives ==
== Directives ==
OpenMP directives are inserted in Fortran programs using sentinels. A sentinel is a keyword placed immediately after a symbol that marks a comment. For example:
OpenMP directives are inserted in Fortran programs using sentinels. A sentinel is a keyword placed immediately after a symbol that marks a comment. For example:
<pre>
<pre>
Line 24: Line 25:
</pre>
</pre>


In C, directives are inserted using a pragma, as follows:
In C, directives are inserted using a pragma construct, as follows:
<syntaxhighlight lang="c">
<syntaxhighlight lang="c">
#pragma omp directive
#pragma omp directive
Line 148: Line 149:


== Environment ==
== Environment ==
There are four environment variables influence the execution of an OpenMP program:
There are four main environment variables that influence the execution of an OpenMP program:
<pre>
<pre>
OMP_SCHEDULE
OMP_SCHEDULE
Line 157: Line 158:
They can be set and modified using the UNIX command  
They can be set and modified using the UNIX command  
{{command|export OMP_NUM_THREADS{{=}}12}}
{{command|export OMP_NUM_THREADS{{=}}12}}
for example. In most cases, you want to use set <tt>OMP_NUM_THREADS</tt> to the number of reserved processors per machine though this could be different for a hybrid OpenMP/MPI application.  
for example. In most cases, you want to use set <tt>OMP_NUM_THREADS</tt> to the number of reserved cores per machine though this could be different for a hybrid OpenMP/MPI application.  


The second most important environment variable is probably <tt>OMP_SCHEDULE</tt>. This one controls how loops (and, more generally, parallel sections) are distributed. The default value depends on the compiler, and can be put into the source code. Possible values are
The second most important environment variable is probably <tt>OMP_SCHEDULE</tt>. This one controls how loops (and, more generally, parallel sections) are distributed. The default value depends on the compiler, and can also be added into the source code. Possible values are
''static,n'', ''dynamic,n'', ''guided,n'' or ''auto''. For the first three cases, ''n'' corresponds to the number of iterations managed by each thread. For the ''static'' case, the number of iterations is fixed, and iterations are distributed at the beginning of the parallel section. For the ''dynamic'' case, the number of iterations is fixed, but the they are distributed during execution, as a function of the time required by each thread to execute its iterations. For the ''guided'' case, ''n'' corresponds to the minimal number of iterations. The number of iterations is first chosen to be "large", but dynamically shrinks gradually as the remaining number of iterations diminishes. For the ''auto'' mode, the compiler and the library are free to choose what to do.
''static,n'', ''dynamic,n'', ''guided,n'' or ''auto''. For the first three cases, ''n'' corresponds to the number of iterations managed by each thread. For the ''static'' case, the number of iterations is fixed, and iterations are distributed at the beginning of the parallel section. For the ''dynamic'' case, the number of iterations is fixed, but they are distributed during execution, as a function of the time required by each thread to execute its iterations. For the ''guided'' case, ''n'' corresponds to the minimal number of iterations. The number of iterations is first chosen to be "large", but dynamically shrinks gradually as the remaining number of iterations diminishes. For the ''auto'' mode, the compiler and the library are free to choose what to do.


The advantage of the cases ''dynamic'', ''guided'' and ''auto'', is that the theoretically allow to better balance the threads, because they dynamically adjust depending on the time required for each thread. Their disadvantage is that on the contrary you do not know in advance on which processor a certain thread executes, and which memory it will need to access. Hence, with this kind of scheduling, it is impossible to predict the affinity between memory and the executing processor. This can be particularly problematic in a
The advantage of the cases ''dynamic'', ''guided'' and ''auto'', is that they theoretically allow a better load-balancing of the threads as they dynamically adjust the work assigned to each thread. Their disadvantage is that the programmer does not know in advance on which core a certain thread executes, and which memory it will need to access. Hence, with this kind of scheduling, it is impossible to predict the affinity between memory and the executing core. This can be particularly problematic in a
[http://en.wikipedia.org/wiki/Non_Uniform_Memory_Access NUMA] architecture.
[http://en.wikipedia.org/wiki/Non_Uniform_Memory_Access NUMA] architecture.


Line 168: Line 169:


Environment variables specific to the Intel compiler start with <tt>KMP_</tt> whereas those specific to Gnu start with <tt>GOMP_</tt>. For optimal performance regarding memory access, it is important to set the
Environment variables specific to the Intel compiler start with <tt>KMP_</tt> whereas those specific to Gnu start with <tt>GOMP_</tt>. For optimal performance regarding memory access, it is important to set the
<tt>OMP_PROC_BIND</tt> variable as well as the affinity variables, <tt>KMP_AFFINITY</tt> for Intel, and <tt>GOMP_CPU_AFFINITY</tt> for GNU compilers. This prevents the movement of OpenMP threads between processors by the operating system. This is particularly important in a [http://en.wikipedia.org/wiki/Non_Uniform_Memory_Access NUMA] architecture as can be found on most modern computers.
<tt>OMP_PROC_BIND</tt> variable as well as the affinity variables, <tt>KMP_AFFINITY</tt> for Intel, and <tt>GOMP_CPU_AFFINITY</tt> for GNU compilers. This prevents the movement of OpenMP threads between processors by the operating system. This is particularly important in a [http://en.wikipedia.org/wiki/Non_Uniform_Memory_Access NUMA] architecture found in most modern computers.
cc_staff
30

edits

Navigation menu