38,757
edits
(Updating to match new version of source page) |
(Updating to match new version of source page) |
||
Line 65: | Line 65: | ||
-----------------------------------------------------------------------------------</pre> | -----------------------------------------------------------------------------------</pre> | ||
=== | === Configure license file === | ||
The configuration of your cluster account abaqus license file depends on the abaqus module version:<br> | The configuration of your cluster account abaqus license file depends on the abaqus module version:<br> | ||
Line 81: | Line 81: | ||
If your abaqus jobs fail with error message [*** ABAQUS/eliT_CheckLicense rank 0 terminated by signal 11 (Segmentation fault)] in the slurm output file verify your <code>abaqus.lic</code> file contains ABAQUSLM_LICENSE_FILE to use abaqus/2020. If your abaqus jobs fail with error message starting [License server machine is down or not responding etc] in the output file verify your <code>abaqus.lic</code> file contains LM_LICENSE_FILE to use abaqus/6.14.1 as shown. | If your abaqus jobs fail with error message [*** ABAQUS/eliT_CheckLicense rank 0 terminated by signal 11 (Segmentation fault)] in the slurm output file verify your <code>abaqus.lic</code> file contains ABAQUSLM_LICENSE_FILE to use abaqus/2020. If your abaqus jobs fail with error message starting [License server machine is down or not responding etc] in the output file verify your <code>abaqus.lic</code> file contains LM_LICENSE_FILE to use abaqus/6.14.1 as shown. | ||
=== Check license | === Check license status === | ||
To query the sharcnet license server for started jobs, queued jobs, and reservations by purchasing groups run: | To query the sharcnet license server for started jobs, queued jobs, and reservations by purchasing groups run: | ||
Line 91: | Line 91: | ||
</source> | </source> | ||
When <i>abaqus licensing lmstat</i> shows your job is "queued" this means it has entered the "R"unning state from the squeue | When <i>abaqus licensing lmstat</i> shows your job is "queued" this means it has entered the "R"unning state from the perspective of either the <code>squeue -j jobid</code> or <code>sacct -j jobid</code> commands but is waiting for a license (not started) and therefore idle. This will have the same impact on your account priority as if the job were consuming cputime and thus should be avoided. When <i>abaqus licensing lmstat</i> indicates the job is in the "start" state then it has acquired the required abaqus tokens from the license and consuming cputime ... | ||
=== | === Specify job resources === | ||
To | To ensure optimal usage of both your Abaqus tokens and the Compute Canada resources its important to carefully specify the required memory and ncpus in your slurm script. The values can be determined by submitting a few short test jobs to the queue then checking their utilization. For <b>completed</b> jobs use <code>seff JobNumber</code> to show the total "Memory Utilized" and "Memory Efficiency"; If the "Memory Efficiency" is less than ~90% decrease the value of "#SBATCH --mem=" setting in your slurm script accordingly. Notice that the <code>seff JobNumber</code> command also shows the total "CPU (time) Utilized" and "CPU Efficiency"; If the "CPU Efficiency" is less than ~90% perform scaling tests to determine the optimal number of cpu's for optimal performance and then update the value of then update the value of "#SBATCH --cpus-per-task=" in your slurm script. For <b>running</b> jobs use the <code>srun --jobid=29821580 --pty top -d 5 -u $USER</code> command to watch the %CPU, %MEM and RES consumed by each abaqus process on the compute node. The %CPU and %MEM shows the percent usage of the total available respective resources on the node. The RES column shows the resident memory size (human readable format for value over 1gb) used by each abaqus process, by comparison the summation of these is shown by seff. Further information regarding howto [https://docs.computecanada.ca/wiki/Running_jobs#Monitoring_jobs Monitor Jobs] is available in the Compute Canada wiki. | ||
</ | |||
=== Core token mapping === | === Core token mapping === |