OpenMP

From Alliance Doc
Revision as of 19:22, 13 September 2016 by Stubbsda (talk | contribs) (Basic content on OpenMP taken from the Calcul Québec page on this topic.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Description[edit]

OpenMP (Open Multi-Processing) is a programming interface for shared memory parallel computing. This API is supported on numerous platfroms, including Unix and Windows, for the C/C++ and Fortran programming languages. This API consists of a set of directives, a software library, and environment variables.

OpenMP allows for the rapid development of fine-grained parallel applications while staying close to the serial code. There is only one program instance, executed in parallel on multiple processors. Directives inserted into the program allow for the management of the computations' distribution between processors.

The OpenMP interface uses the notion of threads, well-known within object oriented programming. A thread is a bit like a "virtual processor, operating serially". From the programmer's point of view, if there are five threads, then that corresponds virtually to five processors that can do a computation in parallel. It is important to understand that the number of threads is independent of the number of physical processors within the computer. Two processors can, for example, run a program with 10 threads. The operating system decides how to share the processors' time between threads.

Having said that, it is clear that if you have four available processors, you should use at least four threads to be able to profit from all the available computing power, as a thread can not be executed by two processors at the same time. It could be advantageous, in certain cases, to use more threads than the number of available processors. Using too many threads is not recommended, however.

Another important point concerning threads is synchronization. When multiple threads with the same program do computations at the same time, absolutely nothing can be assumed about the order in which things happen. The exact thread distribution method, between processors, remains completely unknown by the programmer.

The following link points to a tutorial for getting started with OpenMP under Linux.

Compilation[edit]

Compiling an OpenMP program is done by simply adding a command-line option for the majority of compiler. For the GNU compilers (GCC), this is -fopenmp, but for Intel it is -openmp. For other compilers, please refer to their documentation.

Directives[edit]

OpenMP directives are inserted in Fortran programs using sentinels. A sentinel is a keyword placed immediately after a symbol that marks a comment. For example:

!$OMP directive 
c$OMP directive 
C$OMP directive 
*$OMP directive

In C, directives are inserted using a pragma, as follows:

#pragma omp directive


OpenMP directives[edit]

Fortran C and C++
!$OMP PARALLEL [clause, clause,…]

block
!$OMP END PARALLEL

#pragma omp parallel [clause, clause,…]

structured-block

!$OMP DO [ clause, clause,… ]

do_loop
!$OMP END DO

#pragma omp for [ clause, clause,… ]

for-loop

!$OMP SECTIONS [clause, clause,…]

!$OMP SECTION
block
!$OMP SECTION
block
!$OMP END SECTIONS [NOWAIT]

#pragma omp sections [clause, clause,…] {

[ #pragma omp section ]
structured-block
[ #pragma omp section ]
structured-block
}

!$OMP SINGLE [clause, clause,…]

block
!$OMP END SINGLE [NOWAIT]

#pragma omp single [clause, clause,…]

structured-block

!$OMP PARALLEL DO [clause, clause,…]

DO_LOOP
[ !$OMP END PARALLEL DO ]

#pragma omp parallel for [clause, clause,…]

for-loop

!$OMP PARALLEL SECTIONS [clause, clause,…]

!$OMP SECTION
block
!$OMP SECTION
block
!$OMP END PARALLEL SECTIONS

#pragma omp parallel sections [clause, clause,…] {

[ #pragma omp section ]
structured-block
[ #pragma omp section ]
structured-block
}

!$OMP MASTER

block
!$OMP END MASTER

#pragma omp master

structured-block

!$OMP CRITICAL [(name)]

block
!$OMP END CRITICAL [(name)]

#pragma omp critical [(name)]

structured-block

!$OMP BARRIER #pragma omp barrier
!$OMP ATOMIC

expresion_statement

#pragma omp atomic

expression-statement

!$OMP FLUSH [(list)] #pragma omp flush [(list)]
!$OMP ORDERED

block
!$OMP END ORDERED

#pragma omp ordered

structured-block

!$OMP THREADPRIVATE( /cb/[, /cb/]…) #pragma omp threadprivate ( list )
Clauses
PRIVATE ( list ) private ( list )
SHARED ( list ) shared ( list )
SHARED | NONE ) none )
FIRSTPRIVATE ( list ) firstprivate ( list )
LASTPRIVATE ( list ) lastprivate ( list )
intrinsic } : list ) reduction ( op : list )
IF ( scalar_logical_expression ) if ( scalar-expression )
COPYIN ( list ) copyin ( list )
NOWAIT nowait

Environment[edit]

The following four environment variables influence the execution of an OpenMP program. Depending on the command shell that is used, they can be modified using the UNIX command "export VariableName=value", or "setenv", etc.

OMP_SCHEDULE
OMP_NUM_THREADS
OMP_DYNAMIC
OMP_NESTED

In most cases, you want to use OMP_NUM_THREADS = ppn where ppn is the number of reserved processors per machine. This could be different for a hybrid OpenMP/MPI application.

The second most important environment variable is probably OMP_SCHEDULE. This one controls how loops (and, more generally, parallel sections) are distributed. The default value depends on the compiler, and can be put into the source code. Possible values are static,n, dynamic,n, guided,n or auto. For the first three cases, n corresponds to the number of iterations managed by each thread. For the static case, the number of iterations is fixed, and iterations are distributed at the beginning of the parallel section. For the dynamic case, the number of iterations is fixed, but the they are distributed during execution, as a function of the time required by each thread to execute its iterations. For the guided case, n corresponds to the minimal number of iterations. The number of iterations is first chosen to be "large", but dynamically shrinks gradually as the remaining number of iterations diminishes. For the auto mode, the compiler and the library are free to choose what to do.

The advantage of the cases dynamic, guided and auto, is that the theoretically allow to better balance the threads, because they dynamically adjust depending on the time required for each thread. Their disadvantage is that on the contrary you do not know in advance on which processor a certain thread executes, and which memory it will need to access. Hence, with this kind of scheduling, it is impossible to predict the affinity between memory and the executing processor. This can be particularly problematic in a NUMA architecture.

Other environment variables are also available. Certain variables are specific to a compiler whereas others are more generic. For an exhaustive list for the Intel compiler, please see the following web site, and for the GNU compilers, see this one.

Environment variables specific to the Intel compiler start with KMP_ whereas those specific to Gnu start with GOMP_. For optimal performance regarding memory access, it is important to set the OMP_PROC_BIND variable

as well as the affinity variables,

 KMP_AFFINITY

for Intel, and

 GOMP_CPU_AFFINITY

for GNU compilers. This prevents the movement of OpenMP threads between processors by the operating system. This is particularly important in a NUMA architecture as can be found on most modern architectures.