cc_staff
150
edits
(Added a section on using Gnu Parallel across multiple nodes.) |
No edit summary |
||
Line 5: | Line 5: | ||
== Basic Usage == | == Basic Usage == | ||
Parallel uses curly brackets <tt>{}</tt> as parameters for the command to be run. For example, to run <tt> | Parallel uses curly brackets <tt>{}</tt> as parameters for the command to be run. For example, to run <tt>gzip</tt> on all the text files in a directory, you can execute | ||
{{Command|ls *.txt {{!}} parallel | {{Command|ls *.txt {{!}} parallel gzip {{(}}{{)}} }} | ||
An alternative syntax is to use <tt>:::</tt>, such as this example: | An alternative syntax is to use <tt>:::</tt>, such as this example: | ||
Line 16: | Line 16: | ||
3 | 3 | ||
}} | }} | ||
Note that Gnu Parallel refers to each of the commands executed as <it>jobs</it>. This can be confusing because on many Compute Canada systems, a job is a batch script run by a scheduler or resource manager, and Gnu Parallel would be used inside that job. From that perspective, Gnu Parallel's jobs are <it>sub-jobs</it>. | |||
== Multiple Arguments == | == Multiple Arguments == | ||
Line 33: | Line 35: | ||
The syntax <tt>::::</tt> takes the content of a file to generate the list of values for the arguments. For example, if you have a list of parameter values in the file <tt>mylist.txt</tt>, you may display its content with: | The syntax <tt>::::</tt> takes the content of a file to generate the list of values for the arguments. For example, if you have a list of parameter values in the file <tt>mylist.txt</tt>, you may display its content with: | ||
{{Command|parallel echo {{(}}1{{)}} :::: mylist.txt}} | {{Command|parallel echo {{(}}1{{)}} :::: mylist.txt}} | ||
== File Content as Command List == | |||
Gnu parallel can also interpret the lines of a file as the actual sub-jobs to be run in parallel, by using redirection. For example, if you have a list of sub-jobs in the file <tt>mycommands.txt</tt> (one per line), you may run them in parallel as follows: | |||
{{Command|parallel < mycommands.txt}} | |||
Note that there is no command-argument given to parallel. This usage mode can be particularly useful if the sub-jobs contain symbols that are special to gnu parallel, or the sub-command are to contain a few commands (e.g. <tt>cd dir1 && ./executable</tt>). | |||
==Running on Multiple Nodes== | ==Running on Multiple Nodes== | ||
Line 39: | Line 47: | ||
|parallel --jobs 12 --sshloginfile $PBS_NODEFILE --env MY_VARIABLE --workdir $PWD ./my_program | |parallel --jobs 12 --sshloginfile $PBS_NODEFILE --env MY_VARIABLE --workdir $PWD ./my_program | ||
}} | }} | ||
In this case, we suppose that each node has 12 CPU cores and we will use the <tt>PBS_NODEFILE</tt> file created automatically by the job scheduler to tell Gnu Parallel which nodes to use for the distribution of tasks. The <tt>--env</tt> allows us to transfer a named environment variable to all the nodes while the <tt>--workdir</tt> option ensures that the Gnu Parallel tasks will start in same directory as the main node. | In this case, we suppose that each node has 12 CPU cores and we will use the <tt>$PBS_NODEFILE</tt> file created automatically by the job scheduler to tell Gnu Parallel which nodes to use for the distribution of tasks. The <tt>--env</tt> allows us to transfer a named environment variable to all the nodes while the <tt>--workdir</tt> option ensures that the Gnu Parallel tasks will start in same directory as the main node. | ||
==Keeping Track of Completed and Failed Commands, and Restart Capabilities== | |||
You can tell Gnu Parallel to keep track of which commands have completed by using the <tt>--joblog JOBLOGFILE</tt> argument. The file JOBLOGFILE will contain the list of completed commands, their start times, durations, hosts, and exit values. E.g. | |||
{{Command|ls *.txt {{!}} parallel --joblog gzip.log gzip {{(}}{{)}} }} | |||
The job log functionality opens the door to a number of possible restart options. If the <tt>parallel</tt> command was interrupted (e.g. your job ran longer than the requested walltime of a job), you can make it pick up where it left off using the <tt>--resume</tt> option, e.g. | |||
{{Command|ls *.txt {{!}} parallel --resume --joblog gzip.log gzip {{(}}{{)}} }} | |||
The new jobs will be appended to the old log file. | |||
If some of the subcommands failed (i.e., they produced a non-zero exit code), and you have think that you have eliminated the source of the error, you can re-run the failed ones, using the <tt>--resume-failed</tt>, e.g. | |||
{{Command|ls *.txt {{!}} parallel --resume-failed --joblog gzip.log gzip {{(}}{{)}} }} | |||
(Note that this will also start subjobs that were not considered before). |