OpenMP: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
 
(23 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== Description ==
<languages />
[http://openmp.org/wp/ OpenMP] (Open Multi-Processing) is an application programming interface for shared memory parallel computing. This API is supported on numerous platforms, including Linux and Windows, and is available for the C/C++ and Fortran programming languages. The API consists of a set of directives, a software library, and environment variables.
<translate>
== Description == <!--T:1-->
[http://openmp.org/wp/ OpenMP] (Open Multi-Processing) is an application programming interface (API) for shared memory parallel computing. It is supported on numerous platforms, including Linux and Windows, and is available for the C/C++ and Fortran programming languages. The API consists of a set of directives, a software library, and environment variables.


OpenMP allows for the rapid development of fine-grained parallel applications on a multicore machine while staying close to the serial code. Although there is only one program instance running, it can execute multiple subtasks in parallel. Directives inserted into the program control the parallel execution and allow for the management of the computations' distribution between the subtasks. The beauty of these directives is that they are usually non-intrusive and the compiler that does not support them can still compile the program and the user can run it, serially of course, as usual.
<!--T:2-->
OpenMP allows one to develop fine-grained parallel applications on a multicore machine while making it possible to preserve the structure of the serial code. Although there is only one program instance running, it can execute multiple subtasks in parallel. Directives inserted into the program control whether a section of the program executes in parallel, and if so, they also control the distribution of work among subtasks. The beauty of these directives is that they are usually non-intrusive. A compiler that does not support them can still compile the program and the user can run it serially.


The OpenMP interface relies on the notion of threads, a well-known concept in the Operating Systems realm. A thread is a bit like a light weight process or a "virtual processor, operating serially", and can formally be defined as the smallest unit of work/processing that can be scheduled by an Operating System. From a programmer's point of view, if there are five threads, then that corresponds virtually to five cores that can do a computation in parallel. It is important to understand that the number of threads is independent of the number of physical cores within the computer. Two cores can, for example, run a program with ten threads. The operating system decides how to share the cores' time between threads.  
<!--T:3-->
OpenMP relies on the notion of [https://en.wikipedia.org/wiki/Thread_(computing) threads]. A thread is a bit like a light weight process or a "virtual processor, operating serially", and can formally be defined as the smallest unit of work/processing that can be scheduled by an operating system. From a programmer's point of view, if there are five threads, then that corresponds virtually to five cores that can do a computation in parallel. It is important to understand that the number of threads is independent of the number of physical cores within the computer. Two cores can, for example, run a program with ten threads. The operating system decides how to share the cores' time between threads.  


Having said that, it is clear that if you have four available cores, you should use at least four threads to be able to profit from all the available computing power, as a thread can not be executed by two processors at the same time. It could be advantageous, in certain cases, to use more threads than the number of available cores. Using too many threads is not recommended, however.
<!--T:4-->
Conversely, one thread can ''not'' be executed by two processors at the same time, so if you have (e.g.) four cores available you must create at least four threads in order to take advantage of them all. In some cases it ''may'' be advantageous to use more threads than the number of available cores, but the usual practice is to match the number of threads to the number of cores.


Another important point concerning threads is synchronization. When multiple threads within the same program do computations at the same time, absolutely nothing can be assumed about the order in which things happen. If the order matters for the correctness of the code, then the necessary openMP synchronization directives need to used to achieve that. The exact thread distribution method, between cores, remains completely unknown to the programmer (openMP and thread affinity offer capabilities to help the programmer control that as well).  
<!--T:5-->
Another important point concerning threads is synchronization. When multiple threads in a program do computations at the same time, one must assume nothing about the order in which things happen. If the order matters for the correctness of the code, then the programmer must use OpenMP synchronization directives to achieve that. Also, the precise distribution of threads over cores is unknown to the programmer (although thread [https://en.wikipedia.org/wiki/Processor_affinity affinity] capabilities are available to control that).


<!--T:24-->
When parallelizing a program using OpenMP (or any other technique) it's important to also consider how well the program is able to run in parallel, known as the software's [[scalability]]. After you've parallelized your software and are satisfied about its correctness, we recommend that you perform a scaling analysis in order to understand its parallel performance.
<!--T:6-->
The following link points to a [http://www.admin-magazine.com/HPC/Articles/Programming-with-OpenMP tutorial for getting started with OpenMP under Linux].
The following link points to a [http://www.admin-magazine.com/HPC/Articles/Programming-with-OpenMP tutorial for getting started with OpenMP under Linux].


== Compilation ==
== Compilation == <!--T:7-->
For the majority of compilers, compiling an OpenMP program is done by simply adding a command-line option to the compilation flags. For the GNU compilers ([[GCC/en|GCC]]), it is <tt>-fopenmp</tt>, but for [[Intel/en|Intel]] it is <tt>-openmp</tt>. For other compilers, please refer to their documentation.  
For most compilers, compiling an OpenMP program is done by simply adding a command-line option to the compilation flags. For the GNU compilers (GCC) it is <tt>-fopenmp</tt>, but for Intel [https://github.com/OpenMathLib/OpenBLAS/issues/1546 depending on the version] it may be <tt>-qopenmp</tt>, <tt>-fopenmp</tt>, or <tt>-openmp</tt>. Please refer to your specific compiler's documentation.


== Directives ==
== Directives == <!--T:8-->


<!--T:9-->
OpenMP directives are inserted in Fortran programs using sentinels. A sentinel is a keyword placed immediately after a symbol that marks a comment. For example:
OpenMP directives are inserted in Fortran programs using sentinels. A sentinel is a keyword placed immediately after a symbol that marks a comment. For example:
</translate>
<pre>
<pre>
!$OMP directive  
!$OMP directive  
Line 24: Line 36:
*$OMP directive
*$OMP directive
</pre>
</pre>
 
<translate>
<!--T:10-->
In C, directives are inserted using a pragma construct, as follows:
In C, directives are inserted using a pragma construct, as follows:
</translate>
<syntaxhighlight lang="c">
<syntaxhighlight lang="c">
#pragma omp directive
#pragma omp directive
</syntaxhighlight>
</syntaxhighlight>


 
<translate>
===OpenMP directives===
===OpenMP directives=== <!--T:11-->
</translate>
{| class="wikitable" style="font-size: 88%; text-align: left;"
{| class="wikitable" style="font-size: 88%; text-align: left;"
! scope="col" width="260px" | Fortran
! scope="col" width="260px" | Fortran
! scope="col" width="260px" | C and C++
! scope="col" width="260px" | C, C++
|-
|-
|<nowiki>!</nowiki>$OMP PARALLEL [clause, clause,…] <br />
|<nowiki>!</nowiki>$OMP PARALLEL [clause, clause,…] <br />
Line 147: Line 162:
|nowait
|nowait
|}
|}
 
<translate>
== Environment ==
== Environment == <!--T:12-->
There are four main environment variables that influence the execution of an OpenMP program:
There are a few environment variables that influence the execution of an OpenMP program:
</translate>
<pre>
<pre>
OMP_NUM_THREADS
OMP_SCHEDULE
OMP_SCHEDULE
OMP_NUM_THREADS
OMP_DYNAMIC
OMP_DYNAMIC
OMP_STACKSIZE
OMP_NESTED
OMP_NESTED
</pre>
</pre>
They can be set and modified using the UNIX command  
<translate>
{{command|export OMP_NUM_THREADS{{=}}12}}
<!--T:13-->
for example. In most cases, you want to use set <tt>OMP_NUM_THREADS</tt> to the number of reserved cores per machine though this could be different for a hybrid OpenMP/MPI application.
They can be set and modified using a UNIX command such as
{{Command|export OMP_NUM_THREADS{{=}}12}}


The second most important environment variable is probably <tt>OMP_SCHEDULE</tt>. This one controls how loops (and, more generally, parallel sections) are distributed. The default value depends on the compiler, and can also be added into the source code. Possible values are
<!--T:26-->
''static,n'', ''dynamic,n'', ''guided,n'' or ''auto''. For the first three cases, ''n'' corresponds to the number of iterations managed by each thread. For the ''static'' case, the number of iterations is fixed, and iterations are distributed at the beginning of the parallel section. For the ''dynamic'' case, the number of iterations is fixed, but they are distributed during execution, as a function of the time required by each thread to execute its iterations. For the ''guided'' case, ''n'' corresponds to the minimal number of iterations. The number of iterations is first chosen to be "large", but dynamically shrinks gradually as the remaining number of iterations diminishes. For the ''auto'' mode, the compiler and the library are free to choose what to do.
In most cases, you want to set <tt>OMP_NUM_THREADS</tt> to the number of reserved cores per machine though this could be different for a hybrid OpenMP/MPI application.  


<!--T:14-->
The second most important environment variable is probably <tt>OMP_SCHEDULE</tt>. This one controls how loops (and, more generally, parallel sections) are distributed. The default value depends on the compiler, and can also be added into the source code. Possible values are ''static,n'', ''dynamic,n'', ''guided,n'' or ''auto''. For the first three cases, ''n'' corresponds to the number of iterations managed by each thread. For the ''static'' case, the number of iterations is fixed, and iterations are distributed at the beginning of the parallel section. For the ''dynamic'' case, the number of iterations is fixed, but they are distributed during execution, as a function of the time required by each thread to execute its iterations. For the ''guided'' case, ''n'' corresponds to the minimal number of iterations. The number of iterations is first chosen to be "large", but dynamically shrinks gradually as the remaining number of iterations diminishes. For the ''auto'' mode, the compiler and the library are free to choose what to do.
<!--T:15-->
The advantage of the cases ''dynamic'', ''guided'' and ''auto'', is that they theoretically allow a better load-balancing of the threads as they dynamically adjust the work assigned to each thread. Their disadvantage is that the programmer does not know in advance on which core a certain thread executes, and which memory it will need to access. Hence, with this kind of scheduling, it is impossible to predict the affinity between memory and the executing core. This can be particularly problematic in a
The advantage of the cases ''dynamic'', ''guided'' and ''auto'', is that they theoretically allow a better load-balancing of the threads as they dynamically adjust the work assigned to each thread. Their disadvantage is that the programmer does not know in advance on which core a certain thread executes, and which memory it will need to access. Hence, with this kind of scheduling, it is impossible to predict the affinity between memory and the executing core. This can be particularly problematic in a
[http://en.wikipedia.org/wiki/Non_Uniform_Memory_Access NUMA] architecture.
[http://en.wikipedia.org/wiki/Non_Uniform_Memory_Access NUMA] architecture.


Other environment variables are also available. Certain variables are specific to a compiler whereas others are more generic. For an exhaustive list for the Intel compiler, please see [http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-lin/GUID-E1EC94AE-A13D-463E-B3C3-6D7A7205F5A1.htm the following web site], and for the GNU compilers, see [http://gcc.gnu.org/onlinedocs/libgomp/Environment-Variables.html this one].
<!--T:25-->
The <code>OMP_STACKSIZE</code> environment variable specifies the size of the stack for each thread created by the OpenMP runtime. Note that the main OpenMP thread (executing the sequential part of the OpenMP program) gets its stack size from the execution shell, while <code>OMP_STACKSIZE</code> applies to each additional thread created at runtime. If <code>OMP_STACKSIZE</code> is not set, its implied value will be 4M. If your OpenMP code does not have enough stack memory, it might crash with a segmentation fault error message.


<!--T:16-->
Other environment variables are also available. Certain variables are specific to a compiler whereas others are more generic. For an exhaustive list for Intel compilers, please see [http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-lin/GUID-E1EC94AE-A13D-463E-B3C3-6D7A7205F5A1.htm the following web site], and for GNU compilers, see [http://gcc.gnu.org/onlinedocs/libgomp/Environment-Variables.html this one].
<!--T:17-->
Environment variables specific to the Intel compiler start with <tt>KMP_</tt> whereas those specific to Gnu start with <tt>GOMP_</tt>. For optimal performance regarding memory access, it is important to set the
Environment variables specific to the Intel compiler start with <tt>KMP_</tt> whereas those specific to Gnu start with <tt>GOMP_</tt>. For optimal performance regarding memory access, it is important to set the
<tt>OMP_PROC_BIND</tt> variable as well as the affinity variables, <tt>KMP_AFFINITY</tt> for Intel, and <tt>GOMP_CPU_AFFINITY</tt> for GNU compilers. This prevents the movement of OpenMP threads between processors by the operating system. This is particularly important in a [http://en.wikipedia.org/wiki/Non_Uniform_Memory_Access NUMA] architecture found in most modern computers.
<tt>OMP_PROC_BIND</tt> variable as well as the affinity variables, <tt>KMP_AFFINITY</tt> for Intel, and <tt>GOMP_CPU_AFFINITY</tt> for GNU compilers. This prevents the movement of OpenMP threads between processors by the operating system. This is particularly important in a [http://en.wikipedia.org/wiki/Non_Uniform_Memory_Access NUMA] architecture found in most modern computers.


== Example ==
== Example == <!--T:18-->
Here is a ''hello world'' example that shows the use of openMP.
Here is a ''Hello world'' example that shows the use of OpenMP.
</translate>


{| border="0" cellpadding="5" cellspacing="0" align="center"
<tabs>
! style="background:#8AA8E5;" | '''C CODE''': <tt>ompHello.c</tt>
<tab name="C">
! style="background:#ECCF98;" | '''FORTRAN CODE''': <tt>ompHello.f90</tt>
{{File
|-valign="top"
  |name=hello.c
|<source lang="c">
  |lang="c"
  |contents=
#include <stdio.h>
#include <stdio.h>
#include <omp.h>
#include <omp.h>
Line 190: Line 220:
   return 0;
   return 0;
}
}
 
}}
</source>
</tab>
|<source lang="fortran">
<tab name="Fortran">
  program omphello
{{File
  |name=hello.f90
  |lang="fortran"
  |contents=
  program hello
   implicit none
   implicit none
   integer omp_get_thread_num,omp_get_num_threads
   integer omp_get_thread_num,omp_get_num_threads
Line 201: Line 235:
   !$omp end parallel
   !$omp end parallel


end
end program hello
</source>
}}
|}
</tab>
</tabs>
 
<translate>
<!--T:19-->
Compiling and running the C code goes as follows:
Compiling and running the C code goes as follows:
  litai10:~$ gcc -O3 -fopenmp ompHello.c -o ompHello  
  litai10:~$ gcc -O3 -fopenmp ompHello.c -o ompHello  
Line 213: Line 251:
  Hello world from thread 3 out of 4
  Hello world from thread 3 out of 4


Compiling and running the fortran 90 code is as follows:
<!--T:20-->
Compiling and running the Fortran 90 code is as follows:
  litai10:~$ gfortran -O3 -fopenmp ompHello.f90 -o fomphello  
  litai10:~$ gfortran -O3 -fopenmp ompHello.f90 -o fomphello  
  litai10:~$ export OMP_NUM_THREADS=4
  litai10:~$ export OMP_NUM_THREADS=4
Line 221: Line 260:
  Hello world from thread          1 out of          4
  Hello world from thread          1 out of          4
  Hello world from thread          3 out of          4
  Hello world from thread          3 out of          4
<!--T:23-->
For an example of how to submit an OpenMP job, see [[Running jobs#Threaded or OpenMP job|Running jobs]].
== References == <!--T:21-->
Lawrence Livermore National Labs has a comprehensive [https://computing.llnl.gov/tutorials/openMP tutorial on OpenMP].
<!--T:22-->
[http://www.openmp.org/ OpenMP.org] publishes the formal [http://www.openmp.org/specifications/ specifications],
handy reference cards for the [http://www.openmp.org/wp-content/uploads/OpenMP-4.0-C.pdf C/C++] and [http://www.openmp.org/wp-content/uploads/OpenMP-4.0-Fortran.pdf Fortran] interfaces, and [http://www.openmp.org/wp-content/uploads/openmp-examples-4.0.2.pdf examples].
</translate>

Latest revision as of 18:31, 15 July 2024

Other languages:

Description[edit]

OpenMP (Open Multi-Processing) is an application programming interface (API) for shared memory parallel computing. It is supported on numerous platforms, including Linux and Windows, and is available for the C/C++ and Fortran programming languages. The API consists of a set of directives, a software library, and environment variables.

OpenMP allows one to develop fine-grained parallel applications on a multicore machine while making it possible to preserve the structure of the serial code. Although there is only one program instance running, it can execute multiple subtasks in parallel. Directives inserted into the program control whether a section of the program executes in parallel, and if so, they also control the distribution of work among subtasks. The beauty of these directives is that they are usually non-intrusive. A compiler that does not support them can still compile the program and the user can run it serially.

OpenMP relies on the notion of threads. A thread is a bit like a light weight process or a "virtual processor, operating serially", and can formally be defined as the smallest unit of work/processing that can be scheduled by an operating system. From a programmer's point of view, if there are five threads, then that corresponds virtually to five cores that can do a computation in parallel. It is important to understand that the number of threads is independent of the number of physical cores within the computer. Two cores can, for example, run a program with ten threads. The operating system decides how to share the cores' time between threads.

Conversely, one thread can not be executed by two processors at the same time, so if you have (e.g.) four cores available you must create at least four threads in order to take advantage of them all. In some cases it may be advantageous to use more threads than the number of available cores, but the usual practice is to match the number of threads to the number of cores.

Another important point concerning threads is synchronization. When multiple threads in a program do computations at the same time, one must assume nothing about the order in which things happen. If the order matters for the correctness of the code, then the programmer must use OpenMP synchronization directives to achieve that. Also, the precise distribution of threads over cores is unknown to the programmer (although thread affinity capabilities are available to control that).

When parallelizing a program using OpenMP (or any other technique) it's important to also consider how well the program is able to run in parallel, known as the software's scalability. After you've parallelized your software and are satisfied about its correctness, we recommend that you perform a scaling analysis in order to understand its parallel performance.

The following link points to a tutorial for getting started with OpenMP under Linux.

Compilation[edit]

For most compilers, compiling an OpenMP program is done by simply adding a command-line option to the compilation flags. For the GNU compilers (GCC) it is -fopenmp, but for Intel depending on the version it may be -qopenmp, -fopenmp, or -openmp. Please refer to your specific compiler's documentation.

Directives[edit]

OpenMP directives are inserted in Fortran programs using sentinels. A sentinel is a keyword placed immediately after a symbol that marks a comment. For example:

!$OMP directive 
c$OMP directive 
C$OMP directive 
*$OMP directive

In C, directives are inserted using a pragma construct, as follows:

#pragma omp directive

OpenMP directives[edit]

Fortran C, C++
!$OMP PARALLEL [clause, clause,…]

block
!$OMP END PARALLEL

#pragma omp parallel [clause, clause,…]

structured-block

!$OMP DO [ clause, clause,… ]

do_loop
!$OMP END DO

#pragma omp for [ clause, clause,… ]

for-loop

!$OMP SECTIONS [clause, clause,…]

!$OMP SECTION
block
!$OMP SECTION
block
!$OMP END SECTIONS [NOWAIT]

#pragma omp sections [clause, clause,…] {

[ #pragma omp section ]
structured-block
[ #pragma omp section ]
structured-block
}

!$OMP SINGLE [clause, clause,…]

block
!$OMP END SINGLE [NOWAIT]

#pragma omp single [clause, clause,…]

structured-block

!$OMP PARALLEL DO [clause, clause,…]

DO_LOOP
[ !$OMP END PARALLEL DO ]

#pragma omp parallel for [clause, clause,…]

for-loop

!$OMP PARALLEL SECTIONS [clause, clause,…]

!$OMP SECTION
block
!$OMP SECTION
block
!$OMP END PARALLEL SECTIONS

#pragma omp parallel sections [clause, clause,…] {

[ #pragma omp section ]
structured-block
[ #pragma omp section ]
structured-block
}

!$OMP MASTER

block
!$OMP END MASTER

#pragma omp master

structured-block

!$OMP CRITICAL [(name)]

block
!$OMP END CRITICAL [(name)]

#pragma omp critical [(name)]

structured-block

!$OMP BARRIER #pragma omp barrier
!$OMP ATOMIC

expresion_statement

#pragma omp atomic

expression-statement

!$OMP FLUSH [(list)] #pragma omp flush [(list)]
!$OMP ORDERED

block
!$OMP END ORDERED

#pragma omp ordered

structured-block

!$OMP THREADPRIVATE( /cb/[, /cb/]…) #pragma omp threadprivate ( list )
Clauses
PRIVATE ( list ) private ( list )
SHARED ( list ) shared ( list )
SHARED | NONE ) none )
FIRSTPRIVATE ( list ) firstprivate ( list )
LASTPRIVATE ( list ) lastprivate ( list )
intrinsic } : list ) reduction ( op : list )
IF ( scalar_logical_expression ) if ( scalar-expression )
COPYIN ( list ) copyin ( list )
NOWAIT nowait

Environment[edit]

There are a few environment variables that influence the execution of an OpenMP program:

OMP_NUM_THREADS
OMP_SCHEDULE
OMP_DYNAMIC
OMP_STACKSIZE
OMP_NESTED

They can be set and modified using a UNIX command such as

Question.png
[name@server ~]$ export OMP_NUM_THREADS=12

In most cases, you want to set OMP_NUM_THREADS to the number of reserved cores per machine though this could be different for a hybrid OpenMP/MPI application.

The second most important environment variable is probably OMP_SCHEDULE. This one controls how loops (and, more generally, parallel sections) are distributed. The default value depends on the compiler, and can also be added into the source code. Possible values are static,n, dynamic,n, guided,n or auto. For the first three cases, n corresponds to the number of iterations managed by each thread. For the static case, the number of iterations is fixed, and iterations are distributed at the beginning of the parallel section. For the dynamic case, the number of iterations is fixed, but they are distributed during execution, as a function of the time required by each thread to execute its iterations. For the guided case, n corresponds to the minimal number of iterations. The number of iterations is first chosen to be "large", but dynamically shrinks gradually as the remaining number of iterations diminishes. For the auto mode, the compiler and the library are free to choose what to do.

The advantage of the cases dynamic, guided and auto, is that they theoretically allow a better load-balancing of the threads as they dynamically adjust the work assigned to each thread. Their disadvantage is that the programmer does not know in advance on which core a certain thread executes, and which memory it will need to access. Hence, with this kind of scheduling, it is impossible to predict the affinity between memory and the executing core. This can be particularly problematic in a NUMA architecture.

The OMP_STACKSIZE environment variable specifies the size of the stack for each thread created by the OpenMP runtime. Note that the main OpenMP thread (executing the sequential part of the OpenMP program) gets its stack size from the execution shell, while OMP_STACKSIZE applies to each additional thread created at runtime. If OMP_STACKSIZE is not set, its implied value will be 4M. If your OpenMP code does not have enough stack memory, it might crash with a segmentation fault error message.

Other environment variables are also available. Certain variables are specific to a compiler whereas others are more generic. For an exhaustive list for Intel compilers, please see the following web site, and for GNU compilers, see this one.

Environment variables specific to the Intel compiler start with KMP_ whereas those specific to Gnu start with GOMP_. For optimal performance regarding memory access, it is important to set the OMP_PROC_BIND variable as well as the affinity variables, KMP_AFFINITY for Intel, and GOMP_CPU_AFFINITY for GNU compilers. This prevents the movement of OpenMP threads between processors by the operating system. This is particularly important in a NUMA architecture found in most modern computers.

Example[edit]

Here is a Hello world example that shows the use of OpenMP.

File : hello.c

#include <stdio.h>
#include <omp.h>

int main() {
  #pragma omp parallel
   {
      printf("Hello world from thread %d out of %d\n",
               omp_get_thread_num(),omp_get_num_threads());
   }
  return 0;
}


File : hello.f90

program hello
  implicit none
  integer omp_get_thread_num,omp_get_num_threads
  !$omp parallel
   print *, 'Hello world from thread',omp_get_thread_num(), &
            'out of',omp_get_num_threads()
  !$omp end parallel

end program hello


Compiling and running the C code goes as follows:

litai10:~$ gcc -O3 -fopenmp ompHello.c -o ompHello 
litai10:~$ export OMP_NUM_THREADS=4
litai10:~$ ./ompHello 
Hello world from thread 0 out of 4
Hello world from thread 2 out of 4
Hello world from thread 1 out of 4
Hello world from thread 3 out of 4

Compiling and running the Fortran 90 code is as follows:

litai10:~$ gfortran -O3 -fopenmp ompHello.f90 -o fomphello 
litai10:~$ export OMP_NUM_THREADS=4
litai10:~$ ./fomphello 
Hello world from thread           0 out of           4
Hello world from thread           2 out of           4
Hello world from thread           1 out of           4
Hello world from thread           3 out of           4

For an example of how to submit an OpenMP job, see Running jobs.

References[edit]

Lawrence Livermore National Labs has a comprehensive tutorial on OpenMP.

OpenMP.org publishes the formal specifications, handy reference cards for the C/C++ and Fortran interfaces, and examples.