GNU Parallel: Difference between revisions

GNU Parallel (view source)

Revision as of 16:05, 19 August 2016

147 bytes added , 8 years ago

Marked this version for translation

Stubbsda

Bureaucrats, cc_docs_admin, cc_staff

2,306

edits

@@ Line 1: / Line 1: @@
 <languages />
 <translate>
-== Introduction ==
+== Introduction == <!--T:1-->
 [http://www.gnu.org/software/parallel/ GNU Parallel] is a tool for running many sequential tasks at the same time on one or more nodes. It is useful for running a large number of sequential tasks, especially if they are short or variable duration, as well as when doing a parameter sweep. We will only cover the basic options here, for more advanced usage, please see the [http://www.gnu.org/software/parallel/man.html official documentation].
+<!--T:2-->
 By default, <tt>parallel</tt> will run as many tasks as the number of cores on the node, therefore maximizing resource usage. You can change this behaviour using the option <tt>--jobs</tt> followed by the number of simultaneous tasks that Gnu Parallel should run. When a task finishes, a new task will automatically be started by <tt>parallel</tt>.
-== Basic Usage ==
+== Basic Usage == <!--T:3-->
 Parallel uses curly brackets <tt>{}</tt> as parameters for the command to be run. For example, to run <tt>gzip</tt> on all the text files in a directory, you can execute
 {{Command|ls *.txt {{!}} parallel gzip {{(}}{{)}} }}
+<!--T:4-->
 An alternative syntax is to use <tt>:::</tt>, such as this example:
 {{Command
@@ Line 19: / Line 21: @@
 }}
+<!--T:5-->
 Note that Gnu Parallel refers to each of the commands executed as <it>jobs</it>.  This can be confusing because on many Compute Canada systems, a job is a batch script run by a scheduler or resource manager, and Gnu Parallel would be used inside that job.  From that perspective, Gnu Parallel's jobs are <it>sub-jobs</it>.
-== Multiple Arguments ==
+== Multiple Arguments == <!--T:6-->
 You can also use multiple arguments by enumerating them, for example:
 {{Command
@@ Line 34: / Line 37: @@
 }}
-== File Content as Argument List ==
+== File Content as Argument List == <!--T:7-->
 The syntax <tt>::::</tt> takes the content of a file to generate the list of values for the arguments. For example, if you have a list of parameter values in the file <tt>mylist.txt</tt>, you may display its content with:
 {{Command|parallel echo {{(}}1{{)}} :::: mylist.txt}}
-== File Content as Command List ==
+== File Content as Command List == <!--T:8-->
 Gnu parallel can also interpret the lines of a file as the actual sub-jobs to be run in parallel, by using redirection.  For example, if you have a list of sub-jobs in the file <tt>mycommands.txt</tt> (one per line), you may run them in parallel as follows:
 {{Command|parallel < mycommands.txt}}
+<!--T:9-->
 Note that there is no command-argument given to parallel. This usage mode can be particularly useful if the sub-jobs contain symbols that are special to gnu parallel, or the sub-command are to contain a few commands (e.g. <tt>cd dir1 && ./executable</tt>).
-==Running on Multiple Nodes==
+==Running on Multiple Nodes== <!--T:10-->
 You can also use Gnu Parallel to distribute a workload across multiple nodes in a cluster, such as in the context of a job on a Compute Canada server. An example of this use is the following:
 {{Command
@@ Line 51: / Line 55: @@
 In this case, we suppose that each node has 12 CPU cores and we will use the <tt>$PBS_NODEFILE</tt> file created automatically by the job scheduler to tell Gnu Parallel which nodes to use for the distribution of tasks. The <tt>--env</tt> allows us to transfer a named environment variable to all the nodes while the <tt>--workdir</tt> option ensures that the Gnu Parallel tasks will start in same directory as the main node.
-==Keeping Track of Completed and Failed Commands, and Restart Capabilities==
+==Keeping Track of Completed and Failed Commands, and Restart Capabilities== <!--T:11-->
 You can tell Gnu Parallel to keep track of which commands have completed by using the <tt>--joblog JOBLOGFILE</tt> argument. The file JOBLOGFILE will contain the list of completed commands, their start times, durations, hosts, and exit values.  E.g.
 {{Command|ls *.txt {{!}} parallel --joblog gzip.log gzip {{(}}{{)}} }}
+<!--T:12-->
 The job log functionality opens the door to a number of possible restart options.  If the <tt>parallel</tt> command was interrupted (e.g. your job ran longer than the requested walltime of a job), you can make it pick up where it left off using the <tt>--resume</tt> option, e.g.
 {{Command|ls *.txt {{!}} parallel --resume --joblog gzip.log gzip {{(}}{{)}} }}
 The new jobs will be appended to the old log file.
+<!--T:13-->
 If some of the subcommands failed (i.e., they produced a non-zero exit code), and you have think that you have eliminated the source of the error, you can re-run the failed ones, using the <tt>--resume-failed</tt>, e.g.
 {{Command|ls *.txt {{!}} parallel --resume-failed --joblog gzip.log gzip {{(}}{{)}} }}
 (Note that this will also start subjobs that were not considered before).
 </translate>