Abaqus/en: Difference between revisions

Abaqus/en (view source)

Revision as of 01:22, 3 June 2020

3,442 bytes added , 4 years ago

Updating to match new version of source page

FuzzyBot

Bots

38,757

edits

@@ Line 22: / Line 22: @@
 Replace <code>port@server</code> with the port number and name of your Abaqus license server. Your license server must be reachable by our compute nodes, so your firewall will need to be configured appropriately. This usually requires our technical team to get in touch with the technical people managing your license software. Please contact our [[Technical support|technical support]] and we will provide a list of IP addresses used by our clusters and obtain the information we need on the port and IP address of your server.
-= Cluster batch job submission =
+= Cluster job submission =
-Below is a sample slurm script to submit a parallel job to a compute node using 4 cores:
+Below is a sample slurm script to submit a parallel job to a single compute node using 4 cores:
 {{File
@@ Line 30: / Line 30: @@
    |contents=
 #!/bin/bash
-#SBATCH --time=00-06:00        # days-hours:minutes
+#SBATCH --time=00-06:00        # days-hrs:mins
-#SBATCH --mem=4000M            # memory on node
+#SBATCH --mem=8G               # node memory > 5G
-#SBATCH --cpus-per-task=4      # number of cores
+#SBATCH --cpus-per-task=4      # number cores > 1
 module load abaqus/6.14.1      # (or abaqus/2020)
@@ Line 38: / Line 38: @@
 export MPI_IC_ORDER='tcp'
-abaqus job=Test input=Sample.inp scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive mp_mode=threads
+abaqus job=test input=sample.inp scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE \
+   interactive mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
 }}
-For a list of available Abaqus' command line options, load an abaqus module and run command: <code>abaqus -help | less</code>
+where a listing of abaqus options can be obtained by loading an abaqus module and running: <code>abaqus -help | less</code>
+== Node memory ==
+An estimate for the total slurm node memory (--mem=) required for a simulation to run fully in ram (without being virtualized to scratch disk) can be obtained by examining the abaqus output <code>test.dat</code> file.  For example a simulation that requires a fairly large amount of memory might show:
+<source lang="bash">
+                   M E M O R Y   E S T I M A T E
+ PROCESS      FLOATING PT       MINIMUM MEMORY        MEMORY TO
+              OPERATIONS           REQUIRED          MINIMIZE I/O
+             PER ITERATION           (MB)               (MB)
+          1.89E+14             3612              96345
+</source>
+To run your simulation interactively and monitor the memory consumption do the following:
+<source lang="bash">
+) ssh into a compute canada cluster, obtain an allocation on a compute node (such as gra100), run abaqus ie)
+    salloc --time=0:30:00 --cpus-per-task=8 --mem=64G --account=def-piname
+    module load abaqus/6.14.1  OR  module load abaqus/2020
+    unset SLURM_GTIDS
+    abaqus job=test input=Sample.inp scratch=$SCRATCH cpus=8 mp_mode=threads interactive
+) ssh into the compute canada cluster again, ssh into the compute node with the allocation, run top ie)
+    ssh gra100
+    top -u $USER
+) watch the VITR and RES columns until steady memory values are observed
+</source>
+To completely satisfy the recommended "MEMORY TO OPERATIONS REQUIRED MINIMIZE I/O" (MRMIO) value at least the same smount of non-swapped physical memory (RES) must be available to abaqus.  Since the RES will in general be less than the virtual memory (VIRT) by some relatively constant amount for a given simulation, it is necessary to slightly over allocate the requested slurm node memory <code>-mem=</code>.  In the above sample slurm script this over-allocation has been hardcoded to a conservative value of 3072MB based on initial testing of the standard abaqus solver.  To avoid long queue wait times associated with large values of MRMIO, it maybe worth investigating the simulation performance impact associated with reducing the RES memory that is made available to abaqus significantly below the MRMIO.  This can be done by lowering the <code>-mem=</code> value which in turn will set an artificially low value of <code>memory=</code> in the abaqus command (found in the last line of the slurm script).  In doing this one should be careful the RES does not dip below the "MINIMUM MEMORY REQUIRED" (MMR) otherwise abaqus will exit due to "Out Of Memory" (OOM).  As an example, if your MRMIO is 96GB try running a series of short test jobs with <code>#SBATCH --mem=8G, 16G, 32G, 64G</code> until an acceptable minimal performance impact is found, noting that smaller values will result in increasingly larger scratch space use by tmpdir files.
+= Cluster graphical use =
+Abaqus/2020 can be run interactively in graphical mode on a cluster compute node (3hr time limit) over TigerVNC with these steps:
+# [https://docs.computecanada.ca/wiki/VNC#Setup Install] TigerVNC client on your desktop
+# [https://docs.computecanada.ca/wiki/VNC#Compute_Nodes Connect] to a cluster compute node with vncviewer
+# <code>module load abaqus/2020</code>
+# <code>abaqus cae -mesa</code>
+= Gra-vdi graphical use =
+NOTE: gra-vdi is currently OFFLINE for upgrading with a return to use date sometime in june
+Abaqus/2020 can be run interactively in graphical mode on gra-vdi (no connection time limit) over TigerVNC with these steps:
+# [https://docs.computecanada.ca/wiki/VNC#Setup Install] TigerVNC client on your desktop
+# [https://docs.computecanada.ca/wiki/VNC#VDI_Nodes Connect] to gra-vdi.computecanada.ca with vncviewer
+# <code>module load SnEnv</code>
+# <code>module load abaqus/2020</code>
+# <code>abaqus cae</code>
 = Site specific usage =
@@ Line 49: / Line 100: @@
 Sharcnet provides a small but free license consisting of 2cae and 21 execute tokens where usage limits are imposed 10 tokens/user and 15 tokens/group.  For groups that have purchased tokens, the free token usage limits are added to their reservation.  The free tokens are available on a first come first serve basis and mainly intended for testing and light usage before deciding whether or not to purchase dedicated tokens.  The license can be used by any Compute Canada member but only on SHARCNET hardware.  Groups that purchase dedicated tokens to run on the SHARCNET license server may likewise only use them on SHARCNET hardware.  Such hardware includes gra-vdi for running abaqus in full graphical mode and graham cluster for submitting compute batch jobs to the queue.  Before you can use the license you must open ticket at <support@computecanada.ca> and request access.  In your email 1) mention that it is for use on Sharcnet systems and 2) include a copy/paste of the following <tt>License Agreement</tt> statement with your full name and Compute Canada username entered in the indicated locations.  Please note that every user must do this ie) cannot be done one time only for a group (including PIs who have purchased their own dedicated tokens).
-=== License agreement ===
+<b>o  License agreement</b>
 <pre>----------------------------------------------------------------------------------
@@ Line 67: / Line 118: @@
 -----------------------------------------------------------------------------------</pre>
-=== Configure license file ===
+<b>o Configure license file</b>
 The configuration of your abaqus license on each cluster depends on the module version being used:<br>
@@ Line 83: / Line 135: @@
 If your abaqus jobs fail with error message [*** ABAQUS/eliT_CheckLicense rank 0 terminated by signal 11 (Segmentation fault)] in the slurm output file verify your <code>abaqus.lic</code> file contains ABAQUSLM_LICENSE_FILE to use abaqus/2020.  If your abaqus jobs fail with error message starting [License server machine is down or not responding etc] in the output file verify your <code>abaqus.lic</code> file contains LM_LICENSE_FILE to use abaqus/6.14.1 as shown.
-=== Check license status ===
+<b>o Check license status</b>
 To query the sharcnet license server for started jobs, queued jobs, and reservations by purchasing groups run:
@@ Line 95: / Line 147: @@
 When <i>abaqus licensing lmstat</i> shows your job is "queued" this means it has entered the "R"unning state from the perspective of either the <code>squeue -j jobid</code> or <code>sacct -j jobid</code> commands but is waiting for a license (not started) and therefore idle.  This will have the same impact on your account priority as if the job were consuming cputime and thus should be avoided.  When <i>abaqus licensing lmstat</i> indicates the job is in the "start" state then it has acquired the required abaqus tokens from the license and consuming cputime ...
-=== Specify job resources ===
+<b>o Specify job resources</b>
 To ensure optimal usage of both your Abaqus tokens and the Compute Canada resources its important to carefully specify the required memory and ncpus in your slurm script.  The values can be determined by submitting a few short test jobs to the queue then checking their utilization.  For <b>completed</b> jobs use <code>seff JobNumber</code> to show the total "Memory Utilized" and "Memory Efficiency"; If the "Memory Efficiency" is less than ~90% decrease the value of "#SBATCH --mem=" setting in your slurm script accordingly.  Notice that the <code>seff JobNumber</code> command also shows the total "CPU (time) Utilized" and "CPU Efficiency"; If the "CPU Efficiency" is less than ~90% perform scaling tests to determine the optimal number of cpu's for optimal performance and then update the value of then update the value of "#SBATCH --cpus-per-task=" in your slurm script.  For <b>running</b> jobs use the <code>srun --jobid=29821580 --pty top -d 5 -u $USER</code> command to watch the %CPU, %MEM and RES for each abaqus parent process on the compute node; The %CPU and %MEM columns display the percent usage relative to the total available on the node while the RES column shows the per process resident memory size (in human readable format for values over 1gb). Further information regarding howto [https://docs.computecanada.ca/wiki/Running_jobs#Monitoring_jobs Monitor Jobs] is available in the Compute Canada wiki.
-=== Core token mapping ===
+<b>o Core token mapping</b>
 <pre>
@@ Line 108: / Line 160: @@
 where TOKENS = floor[5 X CORES^0.422]
-=== Remote visualization ===
+<b>o Using your license</b>
-A) First install the TigerVNC client on your desktop as described in [[VNC]].  Once logged in, start abaqus as follows:
-# Connect to gra-vdi.computecanada.ca with TigerVNC
-# <code>module load SnEnv</code>
-# <code>module load abaqus</code>
-# <code>abaqus cae</code>
-B) If you want to use your own license server (not the SHARCNET license server) do the following:
-# Connect to gra-vdi.computecanada.ca with TigerVNC
-# module load SnEnv
-# module load abaqus
-# export ABAQUSLM_LICENSE_FILE="port@server"
-# abaqus cae
-You must first request your "port@server" be setup by submitting a ticket before jobs can be run on graham.
-<!--T:70-->
+To use your own license server (instead of the default SHARCNET license) on gra-vdi as described in the "Gra-vdi graphical use" section above, run command <code>export ABAQUSLM_LICENSE_FILE="port@server"</code> after loading the abaqus module and before running abaqus cae.
 == Western license ==
-<!--T:72-->
 The Western site license may only be used by Western researchers with hardware located on Western's campus such as the Dusky legacy cluster. Graham and gra-vdi are excluded since they are located at Waterloo (use the Sharcnet License for these systems as described above).  Contact the Western abaqus license server administrator (located in Robarts) to make arrangements before attempting to use the Western abaqus license.  Submit a ticket to Compute Canada support to request the admins contact information if necessary.  You will need to provide your Compute Canada username and likely make arrangements to purchase tokens.  If you are granted access request the port and server values and enter them into your abaqus.lic file as shown in 1) near the top of this wiki which will in turn be used by the Compute Canada module on dusky when it loads.

Abaqus/en: Difference between revisions

Abaqus/en (view source)

Revision as of 01:22, 3 June 2020

Navigation menu

Search