Abaqus: Difference between revisions

Abaqus (view source)

Revision as of 05:31, 26 November 2024

1,039 bytes added , 28 days ago

m

no edit summary

Roberpj

cc_staff

1,894

edits

@@ Line 623: / Line 623: @@
 abaqus licensing lmstat -c $LM_LICENSE_FILE -a | grep "Users of" | egrep "cae|standard|explicit"
 </source>
-<translate>
 <!--T:20858-->
-When the output of query I) above indicates that a job for a particular username is queued this means the job has entered the "R"unning state from the perspective of <code>squeue -j jobid</code> or <code>sacct -j jobid</code> and is therefore idle on a compute node waiting for a license.  This will have the same impact on your account priority as if the job were performing computations and consuming CPU time.  Eventually when sufficient licenses come available the queued job will start.  To demonstrate, the following shows the license server and queue output for the situation where a user submits two jobs, but only the first job acquires enough licenses to start:
+When the output of query I) above indicates that a job for a particular username is queued this means the job has entered the "R"unning state from the perspective of <code>squeue -j jobid</code> or <code>sacct -j jobid</code> and is therefore idle on a compute node waiting for a license.  This will have the same impact on your account priority as if the job were performing computations and consuming CPU time.  Eventually when sufficient licenses come available the queued job will start.
-</translate>
+==== Example ==== <!--T:20661-->
+<!--T:20662-->
+The following shows the situation where a user submitted two 6core jobs (each requiring 12tokens) in quick succession.  The slurm schedular then started each job on a different node in the order they were submitted.  Since the user had 10 abaqus compute tokens the first job (27527287) was able to acquire exactly enough (10) tokens for the solver to begin running.  The second job (27527297) not having access to any more tokens entered an idle "queued" state (as can be scene from the lmstat output) until the first job completed, wasting the available resources and depreciating the users fair share level in the process ...
-  [roberpj@dus241:~] sq
+  [roberpj@gra-login1:~] sq
-          JOBID     USER      ACCOUNT           NAME  ST  TIME_LEFT  NODES  CPUS  MIN_MEM  NODELIST (REASON)
+            JOBID     USER              ACCOUNT           NAME  ST  TIME_LEFT NODES CPUS TRES_PER_N MIN_MEM NODELIST (REASON)
-roberpj  def-roberpj  scriptep1.txt   R    2:59:18      1    12       8G   dus47 (None)
+         27530366  roberpj         cc-debug_cpu  scriptsp2.txt   R    9:56:13     1    6        N/A      8G gra107 (None)
-roberpj  def-roberpj  scriptsp1.txt   R    2:59:33      1    12       8G   dus28 (None)
+         27530407  roberpj         cc-debug_cpu  scriptsp2.txt   R    9:59:37     1    6        N/A      8G gra292 (None)
-  [roberpj@dus241:~] abaqus licensing lmstat -c $LM_LICENSE_FILE -a | egrep "Users|start|queued|RESERVATION"
+  [roberpj@gra-login1:~] abaqus licensing lmstat -c $LM_LICENSE_FILE -a | egrep "Users|start|queued"
-  Users of abaqus:  (Total of 78 licenses issued;  Total of 71 licenses in use)
+ Users of abaqus:  (Total of 78 licenses issued;  Total of 53 licenses in use)
-      roberpj dus47 /dev/tty (v62.2) (license3.sharcnet.ca/27050 275), start Thu 8/27 5:45, 14 licenses
+    roberpj gra107 /dev/tty (v62.6) (license3.sharcnet.ca/27050 1042), start Mon 11/25 17:15, 10 licenses
-      roberpj dus28 /dev/tty (v62.2) (license3.sharcnet.ca/27050 729) queued for 14 licenses
+    roberpj gra292 /dev/tty (v62.6) (license3.sharcnet.ca/27050 125) queued for 10 licenses
+<!--T:20663-->
+To avoid license shortage problems when submitting multiple jobs when working with expensive abaqus tokens either use a [https://docs.alliancecan.ca/wiki/Running_jobs#Cancellation_of_jobs_with_dependency_conditions_which_cannot_be_met job dependency], [https://docs.alliancecan.ca/wiki/Job_arrays job array] or at the very least setup a slurm [https://docs.alliancecan.ca/wiki/Running_jobs#Email_notification email notification] to know when your job completes before manually submitting another one.
 <translate>
 === Specify job resources === <!--T:20859-->
 To ensure optimal usage of both your Abaqus tokens and our resources, it's important to carefully specify the required memory and ncpus in your Slurm script.  The values can be determined by submitting a few short test jobs to the queue then checking their utilization.  For <b>completed</b> jobs use <code>seff JobNumber</code> to show the total <i>Memory Utilized</i> and <i>Memory Efficiency</i>. If the <i>Memory Efficiency</i> is less than ~90%, decrease the value of the <code>#SBATCH --mem=</code> setting in your Slurm script accordingly.  Notice that the <code>seff JobNumber</code> command also shows the total <i>CPU (time) Utilized</i> and <i>CPU Efficiency</i>. If the <i>CPU Efficiency</i> is less than ~90%, perform scaling tests to determine the optimal number of CPUs for optimal performance and then update the value of <code>#SBATCH --cpus-per-task=</code> in your Slurm script.  For <b>running</b> jobs, use the <code>srun --jobid=29821580 --pty top -d 5 -u $USER</code> command to watch the %CPU, %MEM and RES for each Abaqus parent process on the compute node. The %CPU and %MEM columns display the percent usage relative to the total available on the node while the RES column shows the per process resident memory size (in human readable format for values over 1GB). Further information regarding how to [[Running jobs#Monitoring_jobs|monitor jobs]] is available on our documentation wiki

Abaqus: Difference between revisions

Abaqus (view source)

Revision as of 05:31, 26 November 2024

Navigation menu

Search