38,757
edits
(Updating to match new version of source page) |
(Updating to match new version of source page) |
||
Line 34: | Line 34: | ||
= Cluster job submission = | = Cluster job submission = | ||
Below are proto-type slurm scripts for submitting thread and mpi based parallel simulations to single or multiple compute nodes. Most users will find it sufficient to use the <i> | Below are proto-type slurm scripts for submitting thread and mpi based parallel simulations to single or multiple compute nodes. Most users will find it sufficient to use one of the <i>work directory </i> scripts provided in the Single Compute Node sections. The optional "memory=" argument found in the last line of the scripts is intended for larger memory or problematic jobs where 3072MB offset value may require tuning. A listing of all abaqus command line arguments can be obtained by loading an abaqus module and running: <code>abaqus -help | less</code>. For Single Node jobs that run less than a day the <i>work directory script</i> with restart file writing disabled should be sufficient. Single node jobs that will run for more than a day should however write restart files. Jobs that create large restart files will benefit by writing to local disc through the use of the SLURM_TMPDIR environment variable utilized in the <i>tmp directory scripts</i> provided in the two rightmost tabs of the Single Node standard and explicit analysis sections. The restart scripts shown here will continue jobs that have been terminated early for some reason. Such job failures can occur if a job reaches its maximum requested runtime before completing and is killed by the queue or if the compute node the job was running on crashed due to an unexpected hardware failure. Other restart types are possible by further tailoring of the input file (not shown here) to continue a job with additional steps or change the analysis (see the documentation for version specific details). Jobs that require large memory or larger compute resources (beyond that which a single compute node can provide) should use the mpi scripts in the Multiple Node sections below to distribute computing over arbitrary node ranges determined automatically by the schedular. Short scaling test jobs should be run to determine wall clock times (and memory requirements) as a function of the number of cores (2, 4, 8, etc) to determine the optimal number before running any long jobs. | ||
== Standard Analysis == | == Standard Analysis == | ||
Abaqus solvers support thread-based and mpi-based parallelization. Scripts for each type are provided below for running Standard Analysis type jobs on Single or Multiple nodes respectively. | Abaqus solvers support thread-based and mpi-based parallelization. Scripts for each type are provided below for running Standard Analysis type jobs on Single or Multiple nodes respectively. Scripts to perform multiple node job restarts are not currently provided. | ||
=== Single Node Computing === | === Single Node Computing === | ||
<tabs> | <tabs> | ||
<tab name=" | <tab name="work directory script"> | ||
{{File | {{File | ||
|name="scriptsp1.txt" | |name="scriptsp1.txt" | ||
Line 75: | Line 75: | ||
cat testsp1.msg | grep "STARTS\|COMPLETED\|WRITTEN" | cat testsp1.msg | grep "STARTS\|COMPLETED\|WRITTEN" | ||
</tab> | </tab> | ||
<tab name=" | <tab name="work directory restart script"> | ||
{{File | {{File | ||
|name="scriptsp2.txt" | |name="scriptsp2.txt" | ||
Line 103: | Line 103: | ||
*RESTART, READ | *RESTART, READ | ||
</tab> | </tab> | ||
<tab name=" | <tab name="tmp directory script"> | ||
{{File | {{File | ||
|name="scriptst1.txt" | |name="scriptst1.txt" | ||
Line 144: | Line 144: | ||
cat testst1.msg | grep "STARTS\|COMPLETED\|WRITTEN" | cat testst1.msg | grep "STARTS\|COMPLETED\|WRITTEN" | ||
</tab> | </tab> | ||
<tab name=" | <tab name="tmp directory restart script"> | ||
{{File | {{File | ||
|name="scriptst2.txt" | |name="scriptst2.txt" | ||
Line 289: | Line 289: | ||
No input file modifications are required to restart the analysis. | No input file modifications are required to restart the analysis. | ||
</tab> | </tab> | ||
<tab name=" | <tab name="tmp directory job script"> | ||
{{File | {{File | ||
|name="scriptet1.txt" | |name="scriptet1.txt" | ||
Line 330: | Line 330: | ||
cat testet1.sta | grep Restart | cat testet1.sta | grep Restart | ||
</tab> | </tab> | ||
<tab name=" | <tab name="tmp directory restart job script"> | ||
{{File | {{File | ||
|name="scriptet2.txt" | |name="scriptet2.txt" | ||
Line 432: | Line 432: | ||
</source> | </source> | ||
To completely satisfy the recommended "MEMORY TO OPERATIONS REQUIRED MINIMIZE I/O" (MRMIO) value at least the same | To completely satisfy the recommended "MEMORY TO OPERATIONS REQUIRED MINIMIZE I/O" (MRMIO) value at least the same amount of non-swapped physical memory (RES) must be available to abaqus. Since the RES will in general be less than the virtual memory (VIRT) by some relatively constant amount for a given simulation, it is necessary to slightly over allocate the requested slurm node memory <code>-mem=</code>. In the above sample slurm script this over-allocation has been hardcoded to a conservative value of 3072MB based on initial testing of the standard abaqus solver. To avoid long queue wait times associated with large values of MRMIO, it maybe worth investigating the simulation performance impact associated with reducing the RES memory that is made available to abaqus significantly below the MRMIO. This can be done by lowering the <code>-mem=</code> value which in turn will set an artificially low value of <code>memory=</code> in the abaqus command (found in the last line of the slurm script). In doing this one should be careful the RES does not dip below the "MINIMUM MEMORY REQUIRED" (MMR) otherwise abaqus will exit due to "Out Of Memory" (OOM). As an example, if your MRMIO is 96GB try running a series of short test jobs with <code>#SBATCH --mem=8G, 16G, 32G, 64G</code> until an acceptable minimal performance impact is found, noting that smaller values will result in increasingly larger scratch space used by temporary files. | ||
= Graphical use = | = Graphical use = |