Abaqus: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
mNo edit summary
(Marked this version for translation)
 
(132 intermediate revisions by 3 users not shown)
Line 3: Line 3:
__FORCETOC__
__FORCETOC__
<translate>
<translate>
<!--T:1-->
<!--T:20819-->
[https://www.3ds.com/products-services/simulia/products/abaqus/ Abaqus FEA] is a software suite for finite element analysis and computer-aided engineering.
[https://www.3ds.com/products-services/simulia/products/abaqus/ Abaqus FEA] is a software suite for finite element analysis and computer-aided engineering.


= Using your own license = <!--T:2-->
= Using your own license = <!--T:20820-->
Abaqus is available on Compute Canada clusters, but you must provide your own license. To configure your cluster account, create a file named <tt>$HOME/.licenses/abaqus.lic</tt> with the following two lines which support versions 2020 and 6.14.1 respectively. This must be done on each cluster where you plan to run abaqus as follows:
Abaqus software modules are available on our clusters; however, you must provide your own license. To configure your account on a cluster, log in and create a file named <code>$HOME/.licenses/abaqus.lic</code> containing the following two lines which support versions 202X and 6.14.1 respectively. Next, replace <code>port@server</code> with the flexlm port number and server IP address (or fully qualified hostname) of your Abaqus license server.


<!--T:3-->
<!--T:20821-->
{{File
{{File
|name=abaqus.lic
|name=abaqus.lic
|contents=
|contents=
prepend_path("ABAQUSLM_LICENSE_FILE","port@server")
prepend_path("LM_LICENSE_FILE","port@server")
prepend_path("LM_LICENSE_FILE","port@server")
prepend_path("ABAQUSLM_LICENSE_FILE","port@server")
}}
}}


<!--T:5-->
<!--T:20822-->
Replace <code>port@server</code> with the port number and name of your Abaqus license server. Your license server must be reachable by our compute nodes, so your firewall will need to be configured appropriately. This usually requires our technical team to get in touch with the technical people managing your license software. Please contact our [[Technical support|technical support]] and we will provide a list of IP addresses used by our clusters and obtain the information we need on the port and IP address of your server.
If your license has not been set up for use on an Alliance cluster, some additional configuration changes by the Alliance system administrator and your local system administrator will need to be done. Such changes are necessary to ensure the flexlm and vendor TCP ports of your Abaqus server are reachable from all cluster compute nodes when jobs are run via the queue. So we may help you get this done, write to [[Technical support|technical support]]. Please be sure to include the following three items:
 
* flexlm port number
= Online Documentation = <!--T:8-->
* static vendor port number
 
* IP address of your Abaqus license server.
<!--T:9-->
You will then be sent a list of cluster IP addresses so that your administrator can open the local server firewall to allow connections from the cluster on both ports. Please note that a special license agreement must generally be negotiated and signed by SIMULIA and your institution before a local  license may be used remotely on Alliance hardware.
The full ABAQUS documentation (latest version) can be accessed on gra-vdi as shown in the following steps.
 
<!--T:10-->
Account Preparation:
# connect to '''gra-vdi.computecanada.ca''' with tigervnc as described in [https://docs.computecanada.ca/wiki/VNC#VDI_Nodes VDI Nodes]
# open a terminal window on gra-vdi and type <code>firefox</code> (hit enter)
# in the address bar type <code>about:config</code> (hit enter) -> click the <i>Accept the risk</i> button
# in the search bar type <code>uniqe</code> then double click <code>privacy.file_unique_origin</code> to change true to false
 
<!--T:11-->
View Documentation:
# connect to '''gra-vdi.computecanada.ca''' with tigervnc as described in [https://docs.computecanada.ca/wiki/VNC#VDI_Nodes VDI Nodes]
# open a terminal window on gra-vdi and type <code>firefox </code> (hit enter)
# in the search bar copy paste <code>file:///opt/sharcnet/abaqus/2020/doc/English/DSSIMULIA_Established.htm</code>
# find a topic by clicking for example: <i>Abaqus -> Analysis -> Analysis Techniques -> Analysis Continuation Techniques</i>


= Cluster job submission = <!--T:100-->
= Cluster job submission = <!--T:20823-->
Below are prototype Slurm scripts for submitting thread and mpi-based parallel simulations to single or multiple compute nodes.  Most users will find it sufficient to use one of the <b>project directory scripts</b> provided in the <i>Single node computing</i> sections. The optional <code>memory=</code> argument found in the last line of the scripts is intended for larger memory or problematic jobs where 3072MB offset value may require tuning.  A listing of all Abaqus command line arguments can be obtained by loading an Abaqus module and running: <code>abaqus -help | less</code>.


<!--T:102-->
<!--T:20824-->
Below are proto-type slurm scripts for submitting thread and mpi based parallel simulations to single or multiple compute nodes.  Most users will find it sufficient to use the <i>workdir scripts</i> provided in the Single Compute Node sections. The optional "memory=" argument found in the last line of the scripts is intended for larger memory or problematic jobs where 3072MB offset value may require tuning.  A listing of all abaqus command line arguments can be obtained by loading an abaqus module and running: <code>abaqus -help | less</code>.  For Single Node jobs that run less than a day using the <i>workdir scripts</i> with restart file writing disabled should be sufficient. Single node jobs that will run for more than a day should however write restart files.  Jobs that create large restart files will benefit by writing to local disc through the use of the SLURM_TMPDIR environment variable utilized in the <i>tmpdir scripts</i> provided in the two right mosts tabs of the Single Node standard and explicit analysis sections.  The restart scripts shown here will continue jobs that have been terminated early for some reason.  Such job failures can occur if a job reaches its maximum requested runtime before completing and is killed by the queue or if the compute node the job was running on crashed due to an unexpected hardware failure.  Other restart types are possible by further tailoring of the input file (not shown here) to continue a job with additional steps or change the analysis (see the documentation for version specific details).  Jobs that require large memory or larger compute resources (beyond that which a single compute node can provide) should use the mpi scripts in the Multiple Node sections below to distribute computing over arbitrary node ranges determined automatically by the schedular.  Short scaling test jobs should be run to determine wall clock times (and memory requirements) as a function of the number of cores (2, 4, 8, etc) to determine the optimal number before running any long jobs.  
Single node jobs that run less than one day should find the <i>project directory script</i> located in the first tab sufficient. However, single node jobs that run for more than a day should use one of the restart scripts.  Jobs that create large restart files will benefit by writing to local disk through the use of the SLURM_TMPDIR environment variable utilized in the <b>temporary directory scripts</b> provided in the two rightmost tabs of the single node standard and explicit analysis sections.  The restart scripts shown here will continue jobs that have been terminated early for some reason.  Such job failures can occur if a job reaches its maximum requested runtime before completing and is killed by the queue or if the compute node the job was running on crashed due to an unexpected hardware failure.  Other restart types are possible by further tailoring of the input file (not shown here) to continue a job with additional steps or change the analysis (see the documentation for version specific details).


== Standard Analysis == <!--T:2069-->
<!--T:20825-->
Jobs that require large memory or larger compute resources (beyond that which a single compute node can provide) should use the mpi scripts in the <b>multiple node sections</b> below to distribute computing over arbitrary node ranges determined automatically by the scheduler.  Short scaling test jobs should be run to determine wall-clock times (and memory requirements) as a function of the number of cores (2, 4, 8, etc.) to determine the optimal number before running any long jobs.


<!--T:20691-->
== Standard analysis == <!--T:20826-->
Abaqus solvers support thread-based and mpi-based parallelization.  Scripts for each type are provided below for running Standard Analysis type jobs on Single or Multiple nodes respectively.  Template scripts to perform multiple node job restarts are not currently provided pending further testing.
Abaqus solvers support thread-based and mpi-based parallelization.  Scripts for each type are provided below for running Standard Analysis type jobs on Single or Multiple nodes respectively.  Scripts to perform multiple node job restarts are not currently provided.


=== Single Node Computing === <!--T:20691-->
=== Single node computing === <!--T:20827-->  


<!--T:25-->
<!--T:20868-->
<tabs>
<tabs>
<tab name="workdir script">
<tab name="project directory script">
{{File
{{File
   |name="scriptsp1.txt"
   |name="scriptsp1.txt"
Line 65: Line 52:
#SBATCH --nodes=1              # Do not change !
#SBATCH --nodes=1              # Do not change !


module load StdEnv/2016.4
<!--T:20869-->
module load abaqus/2020
module load StdEnv/2020        # Latest installed version
module load abaqus/2021        # Latest installed version
 
<!--T:20870-->
#module load StdEnv/2016       # Uncomment to use
#module load abaqus/2020       # Uncomment to use


<!--T:20871-->
unset SLURM_GTIDS
unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
export MPI_IC_ORDER='tcp'
Line 73: Line 66:
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"


<!--T:20872-->
rm -f testsp1* testsp2*
rm -f testsp1* testsp2*
abaqus job=testsp1 input=mystd-sim.inp \
abaqus job=testsp1 input=mystd-sim.inp \
Line 78: Line 72:
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
}}
}}
To write restart date every N=12 time increments and at the end of each step of the analysis:
 
  *RESTART, WRITE, FREQUENCY=12
<!--T:20828-->
To disable writing restart data (into res,mdl,stt files) instead specify:
To write restart data every N=12 time increments specify in the input file:
  *RESTART, WRITE, FREQUENCY=0
  *RESTART, WRITE, OVERLAY, FREQUENCY=12
To check the completed restart information do:
To write restart data for a total of 12 time increments specify instead:
  cat testsp1.msg | grep "STARTS\|COMPLETED\|WRITTEN"
  *RESTART, WRITE, OVERLAY, NUMBER INTERVAL=12, TIME MARKS=NO
To check for completed restart information do:
  egrep -i "step|start" testsp*.com testsp*.msg testsp*.sta
Some simulations may benefit by adding the following to the Abaqus command at the bottom of the script:
order_parallel=OFF
 
<!--T:20873-->
</tab>
</tab>
<tab name="workdir restart script">
<tab name="project directory restart script">
{{File
{{File
   |name="scriptsp2.txt"
   |name="scriptsp2.txt"
Line 97: Line 97:
#SBATCH --nodes=1              # Do not change !
#SBATCH --nodes=1              # Do not change !


module load StdEnv/2016.4
<!--T:20874-->
module load abaqus/2020
module load StdEnv/2020        # Latest installed version
module load abaqus/2021        # Latest installed version


<!--T:20917-->
unset SLURM_GTIDS
unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
export MPI_IC_ORDER='tcp'
Line 105: Line 107:
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"


rm -f testsp2*
<!--T:20875-->
rm -f testsp2* testsp1.lck
abaqus job=testsp2 oldjob=testsp1 input=mystd-sim-restart.inp \
abaqus job=testsp2 oldjob=testsp1 input=mystd-sim-restart.inp \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
}}
}}
To read input file, input file should contain:
 
<!--T:20829-->
The restart input file should contain:
*HEADING
  *RESTART, READ
  *RESTART, READ
<!--T:20876-->
</tab>
</tab>
<tab name="tmpdir script">
<tab name="temporary directory script">
{{File
{{File
   |name="scriptst1.txt"
   |name="scriptst1.txt"
Line 125: Line 133:
#SBATCH --nodes=1              # Do not change !
#SBATCH --nodes=1              # Do not change !


module load StdEnv/2016.4
<!--T:20877-->
module load abaqus/2020
module load StdEnv/2020        # Latest installed version
module load abaqus/2021        # Latest installed version


<!--T:20878-->
unset SLURM_GTIDS
unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
export MPI_IC_ORDER='tcp'
Line 135: Line 145:
echo "SLURM_TMPDIR = " $SLURM_TMPDIR
echo "SLURM_TMPDIR = " $SLURM_TMPDIR


rm -f test1st* test2st*
<!--T:20879-->
rm -f testst1* testst2*
cd $SLURM_TMPDIR
cd $SLURM_TMPDIR
while sleep 6h; do
while sleep 6h; do
   cp -f * $SLURM_SUBMIT_DIR
   cp -f * $SLURM_SUBMIT_DIR 2>/dev/null
done &
done &
WPID=$!
WPID=$!
Line 144: Line 155:
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
kill $WPID
{ kill $WPID && wait $WPID; } 2>/dev/null
cp -f * $SLURM_SUBMIT_DIR
cp -f * $SLURM_SUBMIT_DIR
}}
}}
To write restart date every N=12 time increments and at the end of each step of the analysis:
 
  *RESTART, WRITE, FREQUENCY=12
<!--T:20830-->
To disable writing restart data (into res,mdl,stt files) instead specify:
To write restart data every N=12 time increments specify in the input file:
  *RESTART, WRITE, FREQUENCY=0
  *RESTART, WRITE, OVERLAY, FREQUENCY=12
To write restart data for a total of 12 time increments specify instead:
  *RESTART, WRITE, OVERLAY, NUMBER INTERVAL=12, TIME MARKS=NO
To check the completed restart information do:
To check the completed restart information do:
  cat testst1.msg | grep "STARTS\|COMPLETED\|WRITTEN"
  egrep -i "step|start" testst*.com testst*.msg testst*.sta
 
<!--T:20880-->
</tab>
</tab>
<tab name="tmpdir restart script">
<tab name="temporary directory restart script">
{{File
{{File
   |name="scriptst2.txt"
   |name="scriptst2.txt"
Line 166: Line 181:
#SBATCH --nodes=1              # Do not change !
#SBATCH --nodes=1              # Do not change !


module load StdEnv/2016.4
<!--T:20881-->
module load abaqus/2020
module load StdEnv/2020        # Latest installed version
module load abaqus/2021        # Latest installed version


<!--T:20882-->
unset SLURM_GTIDS
unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
export MPI_IC_ORDER='tcp'
Line 176: Line 193:
echo "SLURM_TMPDIR = " $SLURM_TMPDIR
echo "SLURM_TMPDIR = " $SLURM_TMPDIR


rm -f testst2*
<!--T:20883-->
rm -f testst2* testst1.lck
cp testst1* $SLURM_TMPDIR
cp testst1* $SLURM_TMPDIR
cd $SLURM_TMPDIR
cd $SLURM_TMPDIR
while sleep 3h; do
while sleep 3h; do
   cp -f testst2* $SLURM_SUBMIT_DIR
   cp -f testst2* $SLURM_SUBMIT_DIR 2>/dev/null
done &
done &
WHILEPID=$!
WHILEPID=$!
Line 186: Line 204:
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
kill $WPID
{ kill $WPID && wait $WPID; } 2>/dev/null
cp -f testst2* $SLURM_SUBMIT_DIR
cp -f testst2* $SLURM_SUBMIT_DIR
}}
}}
To read restart file, input file should contain:
 
<!--T:20831-->
The restart input file should contain:
*HEADING
  *RESTART, READ
  *RESTART, READ
<!--T:20884-->
</tab>
</tab>
</tabs>
</tabs>


=== Multiple Node Computing === <!--T:20692-->
=== Multiple node computing === <!--T:20832-->
Users with large memory or compute needs (and correspondingly large licenses) can use the following script to perform mpi-based computing over an arbitrary range of nodes ideally left to the scheduler to  automatically determine.  A companion template script to perform restart multinode jobs is not currently provided due to additional limitations when they can be used.


<!--T:20693-->
Users with large memory or compute needs (and correspondingly large licenses) can use the following script to perform mpi-based computing over a arbitrary range of nodes ideally left to the schedular to  automatically determine.  A companion template script to perform restart multi-node jobs is not currently provided due to additional limitations when they can be used.


<!--T:20694-->
<!--T:20885-->
{{File
{{File
   |name="scriptsp1-mpi.txt"
   |name="scriptsp1-mpi.txt"
Line 212: Line 234:
#SBATCH --cpus-per-task=1      # Do not change !
#SBATCH --cpus-per-task=1      # Do not change !


module load StdEnv/2016.4
<!--T:20886-->
module load abaqus/2020
module load StdEnv/2020        # Latest installed version
module load abaqus/2021        # Latest installed version


<!--T:20887-->
unset SLURM_GTIDS
unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
export MPI_IC_ORDER='tcp'
Line 220: Line 244:
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"


<!--T:20888-->
rm -f testsp1-mpi*
rm -f testsp1-mpi*


<!--T:20889-->
unset hostlist
unset hostlist
nodes="$(slurm_hl2hl.py --format MPIHOSTLIST {{!}} xargs)"
nodes="$(slurm_hl2hl.py --format MPIHOSTLIST {{!}} xargs)"
Line 230: Line 256:
echo "$mphostlist" > abaqus_v6.env
echo "$mphostlist" > abaqus_v6.env


<!--T:20890-->
abaqus job=testsp1-mpi input=mystd-sim.inp \
abaqus job=testsp1-mpi input=mystd-sim.inp \
   scratch=$SCRATCH cpus=$SLURM_NTASKS interactive mp_mode=mpi
   scratch=$SCRATCH cpus=$SLURM_NTASKS interactive mp_mode=mpi
}}
}}


== Explicit Analysis == <!--T:2078-->
== Explicit analysis == <!--T:20833-->
Abaqus solvers support thread-based and mpi-based parallelization.  Scripts for each type are provided below for running explicit analysis type jobs on single or multiple nodes respectively.  Template scripts to perform multinode job restarts are not currently provided pending further testing.


<!--T:20692-->
=== Single node computing === <!--T:20834-->  
Abaqus solvers support thread-based and mpi-based parallelization.  Scripts for each type are provided below for running Standard Analysis type jobs on Single or Multiple nodes respectively.  Template scripts to perform multi-node job restarts are not currently provided pending further testing.


=== Single Node Computing === <!--T:20691-->


<!--T:26-->
<!--T:20891-->
<tabs>
<tabs>
<tab name="project job script">
<tab name="project directory script">
{{File
{{File
   |name="scriptep1.txt"
   |name="scriptep1.txt"
Line 251: Line 277:
#SBATCH --account=def-group    # specify account
#SBATCH --account=def-group    # specify account
#SBATCH --time=00-06:00        # days-hrs:mins
#SBATCH --time=00-06:00        # days-hrs:mins
#SBATCH --mem=8G              # node memory > 5G
#SBATCH --mem=8000M            # node memory > 5G
#SBATCH --cpus-per-task=4      # number cores > 1
#SBATCH --cpus-per-task=4      # number cores > 1
#SBATCH --nodes=1              # do not change
#SBATCH --nodes=1              # do not change


<!--T:2065-->
<!--T:20892-->
module load abaqus/2020
module load abaqus/2021


<!--T:2079-->
<!--T:20893-->
unset SLURM_GTIDS
unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
export MPI_IC_ORDER='tcp'
Line 264: Line 290:
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"


<!--T:2080-->
<!--T:20894-->
rm -f testep1* testep2*
rm -f testep1* testep2*
abaqus job=testep1 input=myexp-sim.inp \
abaqus job=testep1 input=myexp-sim.inp \
Line 270: Line 296:
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
}}
}}
To write restart output at n=12 time intervals (at the beginning of the step and at increments ending immediately after each time interval) your input file should contain:
 
  *RESTART, WRITE, NUMBER INTERVAL=12, TIME MARKS=NO
<!--T:20835-->
To disable writing restart output (into the abq and sta files) instead specify:
To write restart data for a total of 12 time increments specify in the input file:
  *RESTART, WRITE, NUMBER INTERVAL=0
  *RESTART, WRITE, OVERLAY, NUMBER INTERVAL=12, TIME MARKS=NO
To check the completed restart information do:
Check for completed restart information in relevant output files:
cat testep1.sta | grep Restart
  egrep -i "step|restart" testep*.com testep*.msg testep*.sta
 
<!--T:20895-->
</tab>
</tab>
<tab name="project restart job script">
<tab name="project directory restart script">
{{File
{{File
   |name="scriptep2.txt"
   |name="scriptep2.txt"
Line 285: Line 313:
#SBATCH --account=def-group    # specify account
#SBATCH --account=def-group    # specify account
#SBATCH --time=00-06:00        # days-hrs:mins
#SBATCH --time=00-06:00        # days-hrs:mins
#SBATCH --mem=8G              # node memory > 5G
#SBATCH --mem=8000M            # node memory > 5G
#SBATCH --cpus-per-task=4      # number cores > 1
#SBATCH --cpus-per-task=4      # number cores > 1
#SBATCH --nodes=1              # do not change
#SBATCH --nodes=1              # do not change


<!--T:2066-->
<!--T:20896-->
module load abaqus/2020
module load abaqus/2021


<!--T:2081-->
<!--T:20897-->
unset SLURM_GTIDS
unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
export MPI_IC_ORDER='tcp'
Line 298: Line 326:
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"


<!--T:2082-->
<!--T:20898-->
rm -f testep2*
rm -f testep2* testep1.lck
for f in testep1*; do [[ -f ${f} ]] && cp -a "$f" "testep2${f#testep1}"; done
for f in testep1*; do [[ -f ${f} ]] && cp -a "$f" "testep2${f#testep1}"; done
abaqus job=testep2 input=myexp-sim-restart.inp recover \
abaqus job=testep2 input=myexp-sim.inp recover \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
}}
}}
<!--T:20836-->
No input file modifications are required to restart the analysis.
No input file modifications are required to restart the analysis.
<!--T:20899-->
</tab>
</tab>
<tab name="tmpdir job script">
<tab name="temporary directory script">
{{File
{{File
   |name="scriptet1.txt"
   |name="scriptet1.txt"
Line 315: Line 347:
#SBATCH --account=def-group    # specify account
#SBATCH --account=def-group    # specify account
#SBATCH --time=00-06:00        # days-hrs:mins
#SBATCH --time=00-06:00        # days-hrs:mins
#SBATCH --mem=8G              # node memory > 5G
#SBATCH --mem=8000M            # node memory > 5G
#SBATCH --cpus-per-task=4      # number cores > 1
#SBATCH --cpus-per-task=4      # number cores > 1
#SBATCH --nodes=1              # do not change
#SBATCH --nodes=1              # do not change


<!--T:2067-->
<!--T:20900-->
module load abaqus/2020
module load abaqus/2021


<!--T:2083-->
<!--T:20901-->
unset SLURM_GTIDS
unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
export MPI_IC_ORDER='tcp'
Line 330: Line 362:
echo "SLURM_TMPDIR = " $SLURM_TMPDIR
echo "SLURM_TMPDIR = " $SLURM_TMPDIR


<!--T:2084-->
<!--T:20902-->
rm -f testet1* testet2*
rm -f testet1* testet2*
cd $SLURM_TMPDIR
cd $SLURM_TMPDIR
while sleep 6h; do
while sleep 6h; do
   cp -f * $SLURM_SUBMIT_DIR
   cp -f * $SLURM_SUBMIT_DIR 2>/dev/null
done &
done &
WPID=$!
WPID=$!
Line 340: Line 372:
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
kill $WPID
{ kill $WPID && wait $WPID; } 2>/dev/null
cp -f * $SLURM_SUBMIT_DIR
cp -f * $SLURM_SUBMIT_DIR
}}
}}
To write restart output at n=12 time intervals (at the beginning of the step and at increments ending immediately after each time interval) your input file should contain:
 
  *RESTART, WRITE, NUMBER INTERVAL=12, TIME MARKS=NO
<!--T:20837-->
To disable writing restart output (into the abq and sta files) instead specify:
To write restart data for a total of 12 time increments specify in the input file:
  *RESTART, WRITE, NUMBER INTERVAL=0
  *RESTART, WRITE, OVERLAY, NUMBER INTERVAL=12, TIME MARKS=NO
To check the completed restart information do:
Check for completed restart information in relevant output files:
cat testet1.sta | grep Restart
  egrep -i "step|restart" testet*.com testet*.msg testet*.sta
 
<!--T:20903-->
</tab>
</tab>
<tab name="tmpdir restart job script">
<tab name="temporary directory restart script">
{{File
{{File
   |name="scriptet2.txt"
   |name="scriptet2.txt"
Line 358: Line 392:
#SBATCH --account=def-group    # specify account
#SBATCH --account=def-group    # specify account
#SBATCH --time=00-06:00        # days-hrs:mins
#SBATCH --time=00-06:00        # days-hrs:mins
#SBATCH --mem=8G              # node memory > 5G
#SBATCH --mem=8000M            # node memory > 5G
#SBATCH --cpus-per-task=4      # number cores > 1
#SBATCH --cpus-per-task=4      # number cores > 1
#SBATCH --nodes=1              # do not change
#SBATCH --nodes=1              # do not change


<!--T:2068-->
<!--T:20904-->
module load abaqus/2020
module load abaqus/2021


<!--T:2085-->
<!--T:20905-->
unset SLURM_GTIDS
unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
export MPI_IC_ORDER='tcp'
Line 373: Line 407:
echo "SLURM_TMPDIR = " $SLURM_TMPDIR
echo "SLURM_TMPDIR = " $SLURM_TMPDIR


<!--T:2086-->
<!--T:20906-->
rm -f testet2*
rm -f testet2* testet1.lck
for f in testet1*; do cp -a "$f" $SLURM_TMPDIR/"testet2${f#testet1}"; done
for f in testet1*; do cp -a "$f" $SLURM_TMPDIR/"testet2${f#testet1}"; done
cd $SLURM_TMPDIR
cd $SLURM_TMPDIR
while sleep 3h; do
while sleep 3h; do
   cp -f * $SLURM_SUBMIT_DIR
   cp -f * $SLURM_SUBMIT_DIR 2>/dev/null
done &
done &
WPID=$!
WPID=$!
abaqus job=testet2 input=$SLURM_SUBMIT_DIR/myexp-sim-restart.inp recover \
abaqus job=testet2 input=$SLURM_SUBMIT_DIR/myexp-sim.inp recover \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
kill $WPID
{ kill $WPID && wait $WPID; } 2>/dev/null
cp -f  * $SLURM_SUBMIT_DIR
cp -f  * $SLURM_SUBMIT_DIR
}}
}}
<!--T:20838-->
No input file modifications are required to restart the analysis.
No input file modifications are required to restart the analysis.
<!--T:20907-->
</tab>
</tab>
</tabs>
</tabs>


=== Multiple Node Computing === <!--T:20696-->


<!--T:20698-->
=== Multiple node computing === <!--T:20839-->
 
<!--T:20908-->
{{File
{{File
   |name="scriptep1-mpi.txt"
   |name="scriptep1-mpi.txt"
Line 401: Line 440:
#SBATCH --account=def-group    # Specify account
#SBATCH --account=def-group    # Specify account
#SBATCH --time=00-06:00        # Specify days-hrs:mins
#SBATCH --time=00-06:00        # Specify days-hrs:mins
# SBATCH --nodes=2            # Best to leave commented
#SBATCH --ntasks=8            # Specify number of cores
#SBATCH --ntasks=8            # Specify number of cores
#SBATCH --mem-per-cpu=16G      # Specify memory per core
#SBATCH --mem-per-cpu=16000M  # Specify memory per core
# SBATCH --nodes=2            # Specify number of nodes (optional)
#SBATCH --cpus-per-task=1      # Do not change !
#SBATCH --cpus-per-task=1      # Do not change !


module load StdEnv/2016.4
<!--T:20909-->
module load abaqus/2020
module load StdEnv/2020        # Latest installed version
module load abaqus/2021        # Latest installed version


<!--T:20910-->
unset SLURM_GTIDS
unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
export MPI_IC_ORDER='tcp'
# uncomment next line when using abaqus/2021
export I_MPI_HYDRA_TOPOLIB=ipl
echo "LM_LICENSE_FILE=$LM_LICENSE_FILE"
echo "LM_LICENSE_FILE=$LM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"


<!--T:20911-->
rm -f testep1-mpi*
rm -f testep1-mpi*


<!--T:20912-->
unset hostlist
unset hostlist
nodes="$(slurm_hl2hl.py --format MPIHOSTLIST {{!}} xargs)"
nodes="$(slurm_hl2hl.py --format MPIHOSTLIST {{!}} xargs)"
Line 424: Line 469:
echo "$mphostlist" > abaqus_v6.env
echo "$mphostlist" > abaqus_v6.env


<!--T:20913-->
abaqus job=testep1-mpi input=myexp-sim.inp \
abaqus job=testep1-mpi input=myexp-sim.inp \
   scratch=$SCRATCH cpus=$SLURM_NTASKS interactive mp_mode=mpi
   scratch=$SCRATCH cpus=$SLURM_NTASKS interactive mp_mode=mpi
}}
}}


== Node memory == <!--T:108-->
== Node memory == <!--T:20840-->
 
An estimate for the total slurm node memory (--mem=) required for a simulation to run fully in ram (without being virtualized to scratch disk) can be obtained by examining the Abaqus output <code>test.dat</code> file.  For example, a simulation that requires a fairly large amount of memory might show:
<!--T:110-->
</translate>
An estimate for the total slurm node memory (--mem=) required for a simulation to run fully in ram (without being virtualized to scratch disk) can be obtained by examining the abaqus output <code>test.dat</code> file.  For example a simulation that requires a fairly large amount of memory might show:


<!--T:195-->
<source lang="bash">
<source lang="bash">
                   M E M O R Y  E S T I M A T E
                   M E M O R Y  E S T I M A T E
Line 444: Line 488:
</source>
</source>


<!--T:112-->
<translate>
To run your simulation interactively and monitor the memory consumption do the following:
<!--T:20841-->
<source lang="bash">
To run your simulation interactively and monitor the memory consumption, do the following:<br>
1) ssh into a compute canada cluster, obtain an allocation on a compute node (such as gra100), run abaqus ie)
1) ssh into a cluster, obtain an allocation on a compute node (such as gra100), run abaqus ie.
    salloc --time=0:30:00 --cpus-per-task=8 --mem=64G --account=def-piname
    module load abaqus/6.14.1  OR  module load abaqus/2020
    unset SLURM_GTIDS
    abaqus job=test input=Sample.inp scratch=$SCRATCH cpus=8 mp_mode=threads interactive
2) ssh into the compute canada cluster again, ssh into the compute node with the allocation, run top ie)
    ssh gra100
    top -u $USER
3) watch the VIRT and RES columns until steady peak memory values are observed
</source>


<!--T:114-->
<!--T:20914-->
To completely satisfy the recommended "MEMORY TO OPERATIONS REQUIRED MINIMIZE I/O" (MRMIO) value at least the same smount of non-swapped physical memory (RES) must be available to abaqus.  Since the RES will in general be less than the virtual memory (VIRT) by some relatively constant amount for a given simulation, it is necessary to slightly over allocate the requested slurm node memory <code>-mem=</code>.  In the above sample slurm script this over-allocation has been hardcoded to a conservative value of 3072MB based on initial testing of the standard abaqus solver.  To avoid long queue wait times associated with large values of MRMIO, it maybe worth investigating the simulation performance impact associated with reducing the RES memory that is made available to abaqus significantly below the MRMIO.  This can be done by lowering the <code>-mem=</code> value which in turn will set an artificially low value of <code>memory=</code> in the abaqus command (found in the last line of the slurm script)In doing this one should be careful the RES does not dip below the "MINIMUM MEMORY REQUIRED" (MMR) otherwise abaqus will exit due to "Out Of Memory" (OOM).  As an example, if your MRMIO is 96GB try running a series of short test jobs with <code>#SBATCH --mem=8G, 16G, 32G, 64G</code> until an acceptable minimal performance impact is found, noting that smaller values will result in increasingly larger scratch space use by tmpdir files.
{{Commands
|salloc --time=0:30:00 --cpus-per-task=8 --mem=64G --account=def-piname
|module load abaqus/6.14.1  OR module load abaqus/2020
|unset SLURM_GTIDS
|abaqus job=test input=Sample.inp scratch=$SCRATCH cpus=8 mp_mode=threads interactive
}}


= Graphical use = <!--T:120-->
<!--T:20916-->
2) ssh into the cluster again, ssh into the compute node with the allocation, run top ie.


<!--T:122-->
<!--T:20915-->
Abaqus/2020 can be run interactively in graphical mode on a cluster or gra-vdi using VNC by following these steps:
{{Commands|ssh gra100
|top -u $USER}}


== On a cluster == <!--T:121-->
<!--T:20843-->
3) watch the VIRT and RES columns until steady peak memory values are observed


<!--T:196-->
<!--T:20844-->
# Connect to a compute node (3hr time limit) with [https://docs.computecanada.ca/wiki/VNC#Compute_Nodes TigerVNC]
To completely satisfy the recommended "MEMORY TO OPERATIONS REQUIRED MINIMIZE I/O" (MRMIO) value, at least the same amount of non-swapped physical memory (RES) must be available to Abaqus. Since the RES will in general be less than the virtual memory (VIRT) by some relatively constant amount for a given simulation, it is necessary to slightly over-allocate the requested Slurm node memory <code>-mem=</code>. In the above sample Slurm script, this over-allocation has been hardcoded to a conservative value of 3072MB based on initial testing of the standard Abaqus solver. To avoid long queue wait times associated with large values of MRMIO, it may be worth investigating the simulation performance impact associated with reducing the RES memory that is made available to Abaqus significantly below the MRMIO.  This can be done by lowering the <code>-mem=</code> value which in turn will set an artificially low value of <code>memory=</code> in the Abaqus command (found in the last line of the slurm script).  In doing this one should be careful the RES does not dip below the MINIMUM MEMORY REQUIRED (MMR) otherwise Abaqus will exit due to Out of Memory (OOM).  As an example, if your MRMIO is 96GB try running a series of short test jobs with <code>#SBATCH --mem=8G, 16G, 32G, 64G</code> until an acceptable minimal performance impact is found, noting that smaller values will result in increasingly larger scratch space used by temporary files.
# <code>module load abaqus/2020</code>
# <code>abaqus cae -mesa</code>


== On gra-vdi == <!--T:123-->
= Graphical use = <!--T:20845-->
Abaqus can be run interactively in graphical mode on a cluster or gra-vdi using VNC by following these steps:


<!--T:125-->
== On a cluster == <!--T:20846-->
# Connect to gra-vdi (no time limit) with [https://docs.computecanada.ca/wiki/VNC#VDI_Nodes TigerVNC]
1. Connect to a compute node (3hr salloc time limit) with [[VNC#Compute_nodes|TigerVNC]]<br>
# <code>module load SnEnv</code>
2. Open a new terminal window and enter the following<br>
# <code>module load abaqus/2020</code>
module load StdEnv/2020 abaqus/2021
# <code>abaqus cae</code><br><br>
3. Start the application with<br>
abaqus cae -mesa


<!--T:126-->
== On gra-vdi == <!--T:20847-->
o <b>How to check license availability</b>
1. Connect to gra-vdi with [[VNC#VDI_nodes|TigerVNC]]<br><br>
2. Open a new terminal window and enter one of the following:
module load CcEnv StdEnv/2016 abaqus/6.14.1 or
module load CcEnv StdEnv/2016 abaqus/2020 or
module load CcEnv StdEnv/2020 abaqus/2021
3. Start the application with<br>
abaqus cae


<!--T:127-->
<!--T:20848-->
There must be be at least 1 license not in use for <code>abaqus cae</code> to start according to:
For abaqus to start in gui mode  must be at least <b>one</b> free cae license (not in use).  The SHARCNET license has 2 free and 2 reserved licenses.  If all 4 are in use according to:
abaqus licensing lmstat -c $ABAQUSLM_LICENSE_FILE -a | grep "Users of cae"


<!--T:128-->
<!--T:20918-->
For example, the SHARCNET license has 2 free and 2 reserved licenses. If all 4 are in use the following error message will occur:
[gra-vdi3:~] abaqus licensing lmstat -c $ABAQUSLM_LICENSE_FILE -a | grep "Users of cae"
Users of cae:  (Total of 4 licenses issued; Total of 4 licenses in use)


<!--T:207-->
<!--T:20919-->
<source lang="bash">
Then the following error messages will occur when you attempt to start abaqus cae:
[gra-vdi3:~] abaqus licensing lmstat -c $ABAQUSLM_LICENSE_FILE -a | grep "Users of cae"
Users of cae: (Total of 4 licenses issued;  Total of 4 licenses in use)


<!--T:208-->
<!--T:20920-->
[gra-vdi3:~] abaqus cae
[gra-vdi3:~] abaqus cae
ABAQUSLM_LICENSE_FILE=27050@license3.sharcnet.ca
ABAQUSLM_LICENSE_FILE=27050@license3.sharcnet.ca
/opt/sharcnet/abaqus/2020/Commands/abaqus cae
/opt/sharcnet/abaqus/2020/Commands/abaqus cae
No socket connection to license server manager.
No socket connection to license server manager.
Feature:      cae
Feature:      cae
License path:  27050@license3.sharcnet.ca:
License path:  27050@license3.sharcnet.ca:
FLEXnet Licensing error:-7,96
FLEXnet Licensing error:-7,96
For further information, refer to the FLEXnet Licensing documentation,
For further information, refer to the FLEXnet Licensing documentation,
or contact your local Abaqus representative.
or contact your local Abaqus representative.
Number of requested licenses: 1
Number of requested licenses: 1
Number of total licenses:    4
Number of total licenses:    4
Number of licenses in use:    2
Number of licenses in use:    2
Number of available licenses: 2
Number of available licenses: 2
Abaqus Error: Abaqus/CAE Kernel exited with an error.
Abaqus Error: Abaqus/CAE Kernel exited with an error.
</source>
 
= Site specific use = <!--T:148-->
 
== Sharcnet license == <!--T:150-->
 
<!--T:152-->
Sharcnet provides a small but free license consisting of 2cae and 21 execute tokens where usage limits are imposed 10 tokens/user and 15 tokens/group.  For groups that have purchased dedicated tokens the free token usage limits are added to their reservation.  The free tokens are available on a first come first serve basis and mainly intended for testing and light usage before deciding whether or not to purchase dedicated tokens.  The costs for dedicated tokens in cdn are approximately 110 per compute token and 400 per gui token, submit a ticket to request an official quote.  The license can be used by any Compute Canada member but only on SHARCNET hardware.  Groups that purchase dedicated tokens to run on the SHARCNET license server may likewise only use them on SHARCNET hardware.  Such hardware includes gra-vdi for running abaqus in full graphical mode and graham cluster for submitting compute batch jobs to the queue.  Before you can use the license you must open ticket at <support@computecanada.ca> and request access.  In your email 1) mention that it is for use on Sharcnet systems and 2) include a copy/paste of the following <tt>License Agreement</tt> statement with your full name and Compute Canada username entered in the indicated locations.  Please note that every user must do this ie) cannot be done one time only for a group (including PIs who have purchased their own dedicated tokens).


<!--T:198-->
= Site-specific use = <!--T:20850-->
<b>License agreement</b>
== SHARCNET license ==
SHARCNET provides a small but free license consisting of 2 cae and 35 execute tokens where usage limits are imposed 10 tokens/user and 15 tokens/group.  For groups that have purchased dedicated tokens, the free token usage limits are added to their reservation.  The free tokens are available on a first come first serve basis and mainly intended for testing and light usage before deciding whether or not to purchase dedicated tokens.  Costs for dedicated tokens (in 2021) were approximately CAD$110 per compute token and CAD$400 per GUI token: submit a ticket to request an official quote.  The license can be used by any Alliance researcher, but only on SHARCNET hardware.  Groups that purchase dedicated tokens to run on the SHARCNET license server may likewise only use them on SHARCNET hardware including gra-vdi (for running Abaqus in full graphical mode) and Graham or Dusky clusters (for submitting compute batch jobs to the queue).  Before you can use the license you must contact [[Technical support]] and request access.  In your email 1) mention that it is for use on SHARCNET systems and 2) include a copy/paste of the following <code>License Agreement</code> statement with your full name and username entered in the indicated locations.  Please note that every user must do this it cannot be done one time only for a group; this includes PIs who have purchased their own dedicated tokens.


<!--T:156-->
=== License agreement === <!--T:20851-->
<pre>----------------------------------------------------------------------------------
<pre>----------------------------------------------------------------------------------
Subject: Abaqus Sharcnet Academic License User Agreement
Subject: Abaqus SHARCNET Academic License User Agreement


<!--T:158-->
<!--T:20852-->
This email is to confirm that i "_____________" with username "___________" will
This email is to confirm that i "_____________" with username "___________" will
only use “SIMULIA Academic Software” with tokens from the SHARCNET license server
only use “SIMULIA Academic Software” with tokens from the SHARCNET license server
for the following purposes:
for the following purposes:


<!--T:160-->
<!--T:20853-->
1) on SHARCNET hardware where the software is already installed
1) on SHARCNET hardware where the software is already installed
2) in affiliation with a canadian degree-granting academic institution
2) in affiliation with a Canadian degree-granting academic institution
3) for education, institutional or instruction purposes and not for any commercial
3) for education, institutional or instruction purposes and not for any commercial
   or contract related purposes where results are not publishable
   or contract-related purposes where results are not publishable
4) for experimental, theoretical and/or digital research work, undertaken primarily
4) for experimental, theoretical and/or digital research work, undertaken primarily
   to acquire new knowledge of the underlying foundations of phenomena and observable
   to acquire new knowledge of the underlying foundations of phenomena and observable
Line 542: Line 582:
-----------------------------------------------------------------------------------</pre>
-----------------------------------------------------------------------------------</pre>


<!--T:161-->
=== Configure license file === <!--T:20854-->
<b>o Configure license file</b>
Configure your license file as follows, noting that it is only usable on SHARCNET systems: Graham, gra-vdi and Dusky.
 
</translate>
<!--T:162-->
Configure your license file as follows, noting that it is only usable on SHARCNET systems: graham, gra-vdi and dusky.
 
<!--T:163-->
<source lang="bash">
<source lang="bash">
[gra-login1:~] cat ~/.licenses/abaqus.lic
[gra-login1:~] cat ~/.licenses/abaqus.lic
Line 555: Line 591:
</source>
</source>


<!--T:165-->
<translate>
If your abaqus jobs fail with error message [*** ABAQUS/eliT_CheckLicense rank 0 terminated by signal 11 (Segmentation fault)] in the slurm output file verify your <code>abaqus.lic</code> file contains ABAQUSLM_LICENSE_FILE to use abaqus/2020.  If your abaqus jobs fail with error message starting [License server machine is down or not responding etc] in the output file verify your <code>abaqus.lic</code> file contains LM_LICENSE_FILE to use abaqus/6.14.1 as shown.  The <code>abaqus.lic</code> file shown contains both so you should not see this problem.
<!--T:20855-->
 
If your Abaqus jobs fail with the error message [*** ABAQUS/eliT_CheckLicense rank 0 terminated by signal 11 (Segmentation fault)] in the slurm output file, verify if your <code>abaqus.lic</code> file contains ABAQUSLM_LICENSE_FILE to use abaqus/2020.  If your Abaqus jobs fail with an error message starting [License server machine is down or not responding, etc.] in the output file verify your <code>abaqus.lic</code> file contains LM_LICENSE_FILE to use abaqus/6.14.1 as shown.  The <code>abaqus.lic</code> file shown contains both so you should not see this problem.
<!--T:201-->
<b>o Query license server</b>
 
<!--T:170-->
I) To check the Sharcnet license server for started and queued jobs by username run:


<!--T:172-->
=== Query license server === <!--T:20661-->
Log into graham, load abaqus and then run one of the following:
<source lang="bash">
<source lang="bash">
ssh graham.computecanada.ca
ssh graham.alliancecan.ca
module load StdEnv/2020
module load abaqus
module load abaqus
abaqus licensing lmstat -c $LM_LICENSE_FILE -a | grep "Users\|start\|queued\|RESERVATIONs"
</source>
</source>


<!--T:174-->
<!--T:20921-->
II) To check the Sharcnet license server for reservations of products by purchasing groups run:
I) Check the SHARCNET license server for started and queued jobs:
 
</translate>
<!--T:175-->
<source lang="bash">
abaqus licensing lmstat -c $LM_LICENSE_FILE -a | egrep "Users|start|queued"
</source>
<translate>
<!--T:20856-->
II) Check the SHARCNET license server for started and queued jobs also showing reservations by purchasing groups:
</translate>
<source lang="bash">
<source lang="bash">
ssh graham.computecanada.ca
abaqus licensing lmstat -c $LM_LICENSE_FILE -a | egrep "Users|start|queued|RESERVATION"
module load abaqus
abaqus licensing lmstat -c $LM_LICENSE_FILE -a | grep "Users\|RESERVATIONs"
</source>
</source>
 
<translate>
<!--T:180-->
<!--T:20857-->
III) To check the Sharcnet license server for license usage of the cae, standard and explicit products run:
III) Check the SHARCNET license server for only cae, standard and explicit product availability:
 
</translate>
<!--T:176-->
<source lang="bash">
<source lang="bash">
ssh graham.computecanada.ca
abaqus licensing lmstat -c $LM_LICENSE_FILE -a | grep "Users of" | egrep "cae|standard|explicit"
module load abaqus
abaqus licensing lmstat -c $LM_LICENSE_FILE -a | grep "Users of" | grep "cae\|standard\|explicit"
</source>
</source>
<translate>
<!--T:20858-->
When the output of query I) above indicates that a job for a particular username is queued this means the job has entered the "R"unning state from the perspective of <code>squeue -j jobid</code> or <code>sacct -j jobid</code> and is therefore idle on a compute node waiting for a license.  This will have the same impact on your account priority as if the job were performing computations and consuming CPU time.  Eventually when sufficient licenses come available the queued job will start.  To demonstrate, the following shows the license server and queue output for the situation where a user submits two jobs, but only the first job acquires enough licenses to start:
</translate>


<!--T:177-->
When the output of query I) above indicatesa that a job for a particular username is "queued" this means the job has entered the "R"unning state from the perspective of <code>squeue -j jobid</code> or <code>sacct -j jobid</code> and is therefore idle on a compute node waiting for a license.  This will have the same impact on your account priority as if the job were performing computations and consuming cputime.  Eventually when sufficient licenses come available the "queued" job will "start".  To demonstrate, the following shows the license server and queue output for the situation where a user submits two jobs, but only the first job acquires enough licenses to start:
<!--T:2087-->
  [roberpj@dus241:~] sq
  [roberpj@dus241:~] sq
           JOBID    USER      ACCOUNT          NAME  ST  TIME_LEFT  NODES  CPUS  MIN_MEM  NODELIST (REASON)  
           JOBID    USER      ACCOUNT          NAME  ST  TIME_LEFT  NODES  CPUS  MIN_MEM  NODELIST (REASON)  
Line 600: Line 633:
           29802  roberpj  def-roberpj  scriptsp1.txt  R    2:59:33      1    12      8G  dus28 (None)  
           29802  roberpj  def-roberpj  scriptsp1.txt  R    2:59:33      1    12      8G  dus28 (None)  


<!--T:2088-->
  [roberpj@dus241:~] abaqus licensing lmstat -c $LM_LICENSE_FILE -a | egrep "Users|start|queued|RESERVATION"
  [roberpj@dus241:~] abaqus licensing lmstat -c $LM_LICENSE_FILE -a | grep "Users\|start\|queued\|RESERVATIONs"
   Users of abaqus:  (Total of 78 licenses issued;  Total of 71 licenses in use)
   Users of abaqus:  (Total of 78 licenses issued;  Total of 71 licenses in use)
       roberpj dus47 /dev/tty (v62.2) (license3.sharcnet.ca/27050 275), start Thu 8/27 5:45, 14 licenses
       roberpj dus47 /dev/tty (v62.2) (license3.sharcnet.ca/27050 275), start Thu 8/27 5:45, 14 licenses
       roberpj dus28 /dev/tty (v62.2) (license3.sharcnet.ca/27050 729) queued for 14 licenses
       roberpj dus28 /dev/tty (v62.2) (license3.sharcnet.ca/27050 729) queued for 14 licenses


<!--T:202-->
<translate>
<b>o Specify job resources</b>
=== Specify job resources === <!--T:20859-->
 
To ensure optimal usage of both your Abaqus tokens and our resources, it's important to carefully specify the required memory and ncpus in your Slurm script.  The values can be determined by submitting a few short test jobs to the queue then checking their utilization.  For <b>completed</b> jobs use <code>seff JobNumber</code> to show the total <i>Memory Utilized</i> and <i>Memory Efficiency</i>. If the <i>Memory Efficiency</i> is less than ~90%, decrease the value of the <code>#SBATCH --mem=</code> setting in your Slurm script accordingly.  Notice that the <code>seff JobNumber</code> command also shows the total <i>CPU (time) Utilized</i> and <i>CPU Efficiency</i>. If the <i>CPU Efficiency</i> is less than ~90%, perform scaling tests to determine the optimal number of CPUs for optimal performance and then update the value of <code>#SBATCH --cpus-per-task=</code> in your Slurm script.  For <b>running</b> jobs, use the <code>srun --jobid=29821580 --pty top -d 5 -u $USER</code> command to watch the %CPU, %MEM and RES for each Abaqus parent process on the compute node. The %CPU and %MEM columns display the percent usage relative to the total available on the node while the RES column shows the per process resident memory size (in human readable format for values over 1GB). Further information regarding how to [[Running jobs#Monitoring_jobs|monitor jobs]] is available on our documentation wiki
<!--T:178-->
To ensure optimal usage of both your Abaqus tokens and the Compute Canada resources its important to carefully specify the required memory and ncpus in your slurm script.  The values can be determined by submitting a few short test jobs to the queue then checking their utilization.  For <b>completed</b> jobs use <code>seff JobNumber</code> to show the total "Memory Utilized" and "Memory Efficiency"; If the "Memory Efficiency" is less than ~90% decrease the value of "#SBATCH --mem=" setting in your slurm script accordingly.  Notice that the <code>seff JobNumber</code> command also shows the total "CPU (time) Utilized" and "CPU Efficiency"; If the "CPU Efficiency" is less than ~90% perform scaling tests to determine the optimal number of cpu's for optimal performance and then update the value of then update the value of "#SBATCH --cpus-per-task=" in your slurm script.  For <b>running</b> jobs use the <code>srun --jobid=29821580 --pty top -d 5 -u $USER</code> command to watch the %CPU, %MEM and RES for each abaqus parent process on the compute node; The %CPU and %MEM columns display the percent usage relative to the total available on the node while the RES column shows the per process resident memory size (in human readable format for values over 1gb). Further information regarding howto [https://docs.computecanada.ca/wiki/Running_jobs#Monitoring_jobs Monitor Jobs] is available in the Compute Canada wiki.
 
<!--T:203-->
<b>o Core token mapping</b>


<!--T:182-->
=== Core token mapping === <!--T:20860-->
</translate>
<pre>
<pre>
TOKENS 5  6  7  8  10  12  14  16  19  21  25  28  34  38
TOKENS 5  6  7  8  10  12  14  16  19  21  25  28  34  38
CORES  1  2  3  4  6  8  12  16  24  32  48  64  96 128
CORES  1  2  3  4  6  8  12  16  24  32  48  64  96 128
</pre>
</pre>
 
<translate>
<!--T:184-->
<!--T:20861-->
where TOKENS = floor[5 X CORES^0.422]
where TOKENS = floor[5 X CORES^0.422]


== Western license == <!--T:194-->
== Western license == <!--T:20862-->
The Western site license may only be used by Western researchers on hardware located at Western's campus.  Currently Dusky cluster is the only system that satisfies these conditions. Graham and gra-vdi are excluded since they are located on Waterloo's campus.  Contact the Western abaqus license server administrator <jmilner@robarts.ca> to inquire about using the Western abaqus license.  You will need to provide your Compute Canada username and possibly make arrangements to purchase tokens.  If you are granted access then you may proceed to configure your <code>abaqus.lic</code> file to point to the Western license server as follows:
The Western site license may only be used by Western researchers on hardware located at Western's campus.  Currently, the Dusky cluster is the only system that satisfies these conditions. Graham and gra-vdi are excluded since they are located on Waterloo's campus.  Contact the Western Abaqus license server administrator <jmilner@robarts.ca> to inquire about using the Western Abaqus license.  You will need to provide your username and possibly make arrangements to purchase tokens.  If you are granted access then you may proceed to configure your <code>abaqus.lic</code> file to point to the Western license server as follows:
 
<!--T:199-->
<b>o Configure license file</b>


<!--T:200-->
=== Configure license file === <!--T:20863-->
Configure your license file as follows, noting that it is only usable on dusky.
Configure your license file as follows, noting that it is only usable on Dusky.


<!--T:164-->
</translate>
<source lang="bash">
<source lang="bash">
[dus241:~] cat .licenses/abaqus.lic
[dus241:~] cat .licenses/abaqus.lic
Line 639: Line 664:
prepend_path("ABAQUSLM_LICENSE_FILE","27000@license4.sharcnet.ca")
prepend_path("ABAQUSLM_LICENSE_FILE","27000@license4.sharcnet.ca")
</source>
</source>
<translate>
<!--T:20864-->
Once configured, submit your job as described in the <i>Cluster job submission</i> section above.  If there are any problems submit a problem ticket to [[Technical support|technical support]].  Specify that you are using the Abaqus Western license on dusky and provide the failed job number along with a paste of any error messages as applicable.
= Online documentation = <!--T:20865-->
The full Abaqus documentation (latest version) can be accessed on gra-vdi as shown in the following steps.
<!--T:20866-->
Account preparation:
# connect to <b>gra-vdi.computecanada.ca</b> with tigervnc as described [[VNC#VDI_Nodes | here]]
# open a terminal window on gra-vdi and type <code>firefox</code> (hit enter)
# in the address bar type <code>about:config</code> (hit enter) -> click the <I>I accept the risk!</i> button
# in the search bar type <code>unique</code> then double click <code>privacy.file_unique_origin</code> to change true to false


<!--T:209-->
<!--T:20867-->
Once configured, submit your jobd as described above in the Cluster job submission section.  If there are any problems submit a problem ticket to [[Technical support|technical support]].  Specify that you are using the abaqus Western license on dusky as well as the failed job number along with a paste of any error message if applicable.
View documentation:
# connect to <b>gra-vdi.computecanada.ca</b> with tigervnc as described [[VNC#VDI_Nodes | here]]
# open a terminal window on gra-vdi and type <code>firefox </code> (hit enter)
# in the search bar copy paste one of the following:<br><code>file:///opt/sharcnet/abaqus/2020/doc/English/DSSIMULIA_Established.htm</code>, or<br><code>file:///opt/sharcnet/abaqus/2021/doc/English/DSSIMULIA_Established.htm</code>
# find a topic by clicking for example: <i>Abaqus -> Analysis -> Analysis Techniques -> Analysis Continuation Techniques</i>


</translate>
</translate>

Latest revision as of 17:50, 13 November 2024

Other languages:

Abaqus FEA is a software suite for finite element analysis and computer-aided engineering.

Using your own license[edit]

Abaqus software modules are available on our clusters; however, you must provide your own license. To configure your account on a cluster, log in and create a file named $HOME/.licenses/abaqus.lic containing the following two lines which support versions 202X and 6.14.1 respectively. Next, replace port@server with the flexlm port number and server IP address (or fully qualified hostname) of your Abaqus license server.


File : abaqus.lic

prepend_path("ABAQUSLM_LICENSE_FILE","port@server")
prepend_path("LM_LICENSE_FILE","port@server")


If your license has not been set up for use on an Alliance cluster, some additional configuration changes by the Alliance system administrator and your local system administrator will need to be done. Such changes are necessary to ensure the flexlm and vendor TCP ports of your Abaqus server are reachable from all cluster compute nodes when jobs are run via the queue. So we may help you get this done, write to technical support. Please be sure to include the following three items:

  • flexlm port number
  • static vendor port number
  • IP address of your Abaqus license server.

You will then be sent a list of cluster IP addresses so that your administrator can open the local server firewall to allow connections from the cluster on both ports. Please note that a special license agreement must generally be negotiated and signed by SIMULIA and your institution before a local license may be used remotely on Alliance hardware.

Cluster job submission[edit]

Below are prototype Slurm scripts for submitting thread and mpi-based parallel simulations to single or multiple compute nodes. Most users will find it sufficient to use one of the project directory scripts provided in the Single node computing sections. The optional memory= argument found in the last line of the scripts is intended for larger memory or problematic jobs where 3072MB offset value may require tuning. A listing of all Abaqus command line arguments can be obtained by loading an Abaqus module and running: abaqus -help | less.

Single node jobs that run less than one day should find the project directory script located in the first tab sufficient. However, single node jobs that run for more than a day should use one of the restart scripts. Jobs that create large restart files will benefit by writing to local disk through the use of the SLURM_TMPDIR environment variable utilized in the temporary directory scripts provided in the two rightmost tabs of the single node standard and explicit analysis sections. The restart scripts shown here will continue jobs that have been terminated early for some reason. Such job failures can occur if a job reaches its maximum requested runtime before completing and is killed by the queue or if the compute node the job was running on crashed due to an unexpected hardware failure. Other restart types are possible by further tailoring of the input file (not shown here) to continue a job with additional steps or change the analysis (see the documentation for version specific details).

Jobs that require large memory or larger compute resources (beyond that which a single compute node can provide) should use the mpi scripts in the multiple node sections below to distribute computing over arbitrary node ranges determined automatically by the scheduler. Short scaling test jobs should be run to determine wall-clock times (and memory requirements) as a function of the number of cores (2, 4, 8, etc.) to determine the optimal number before running any long jobs.

Standard analysis[edit]

Abaqus solvers support thread-based and mpi-based parallelization. Scripts for each type are provided below for running Standard Analysis type jobs on Single or Multiple nodes respectively. Scripts to perform multiple node job restarts are not currently provided.

Single node computing[edit]

File : "scriptsp1.txt"

#!/bin/bash
#SBATCH --account=def-group    # Specify account
#SBATCH --time=00-06:00        # Specify days-hrs:mins
#SBATCH --cpus-per-task=4      # Specify number of cores
#SBATCH --mem=8G               # Specify total memory > 5G
#SBATCH --nodes=1              # Do not change !

module load StdEnv/2020        # Latest installed version
module load abaqus/2021        # Latest installed version

#module load StdEnv/2016       # Uncomment to use
#module load abaqus/2020       # Uncomment to use

unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
echo "LM_LICENSE_FILE=$LM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"

rm -f testsp1* testsp2*
abaqus job=testsp1 input=mystd-sim.inp \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"


To write restart data every N=12 time increments specify in the input file:

*RESTART, WRITE, OVERLAY, FREQUENCY=12

To write restart data for a total of 12 time increments specify instead:

*RESTART, WRITE, OVERLAY, NUMBER INTERVAL=12, TIME MARKS=NO

To check for completed restart information do:

egrep -i "step|start" testsp*.com testsp*.msg testsp*.sta

Some simulations may benefit by adding the following to the Abaqus command at the bottom of the script:

order_parallel=OFF
File : "scriptsp2.txt"

#!/bin/bash
#SBATCH --account=def-group    # Specify account
#SBATCH --time=00-06:00        # Specify days-hrs:mins
#SBATCH --cpus-per-task=4      # Specify number of cores
#SBATCH --mem=8G               # Specify total memory > 5G
#SBATCH --nodes=1              # Do not change !

module load StdEnv/2020        # Latest installed version
module load abaqus/2021        # Latest installed version

unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
echo "LM_LICENSE_FILE=$LM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"

rm -f testsp2* testsp1.lck
abaqus job=testsp2 oldjob=testsp1 input=mystd-sim-restart.inp \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"


The restart input file should contain:

*HEADING
*RESTART, READ
File : "scriptst1.txt"

#!/bin/bash
#SBATCH --account=def-group    # Specify account
#SBATCH --time=00-06:00        # Specify days-hrs:mins
#SBATCH --cpus-per-task=4      # Specify number of cores
#SBATCH --mem=8G               # Specify total memory > 5G
#SBATCH --nodes=1              # Do not change !

module load StdEnv/2020        # Latest installed version
module load abaqus/2021        # Latest installed version

unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
echo "LM_LICENSE_FILE=$LM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"
echo "SLURM_SUBMIT_DIR =" $SLURM_SUBMIT_DIR
echo "SLURM_TMPDIR = " $SLURM_TMPDIR

rm -f testst1* testst2*
cd $SLURM_TMPDIR
while sleep 6h; do
   cp -f * $SLURM_SUBMIT_DIR 2>/dev/null
done &
WPID=$!
abaqus job=testst1 input=$SLURM_SUBMIT_DIR/mystd-sim.inp \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
{ kill $WPID && wait $WPID; } 2>/dev/null
cp -f * $SLURM_SUBMIT_DIR


To write restart data every N=12 time increments specify in the input file:

*RESTART, WRITE, OVERLAY, FREQUENCY=12

To write restart data for a total of 12 time increments specify instead:

*RESTART, WRITE, OVERLAY, NUMBER INTERVAL=12, TIME MARKS=NO

To check the completed restart information do:

egrep -i "step|start" testst*.com testst*.msg testst*.sta
File : "scriptst2.txt"

#!/bin/bash
#SBATCH --account=def-group    # Specify account
#SBATCH --time=00-06:00        # Specify days-hrs:mins
#SBATCH --cpus-per-task=4      # Specify number of cores
#SBATCH --mem=8G               # Specify total memory > 5G
#SBATCH --nodes=1              # Do not change !

module load StdEnv/2020        # Latest installed version
module load abaqus/2021        # Latest installed version

unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
echo "LM_LICENSE_FILE=$LM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"
echo "SLURM_SUBMIT_DIR =" $SLURM_SUBMIT_DIR
echo "SLURM_TMPDIR = " $SLURM_TMPDIR

rm -f testst2* testst1.lck
cp testst1* $SLURM_TMPDIR
cd $SLURM_TMPDIR
while sleep 3h; do
   cp -f testst2* $SLURM_SUBMIT_DIR 2>/dev/null
done &
WHILEPID=$!
abaqus job=testst2 oldjob=testst1 input=$SLURM_SUBMIT_DIR/mystd-sim-restart.inp \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
{ kill $WPID && wait $WPID; } 2>/dev/null
cp -f testst2* $SLURM_SUBMIT_DIR


The restart input file should contain:

*HEADING
*RESTART, READ

Multiple node computing[edit]

Users with large memory or compute needs (and correspondingly large licenses) can use the following script to perform mpi-based computing over an arbitrary range of nodes ideally left to the scheduler to automatically determine. A companion template script to perform restart multinode jobs is not currently provided due to additional limitations when they can be used.


File : "scriptsp1-mpi.txt"

!/bin/bash
#SBATCH --account=def-group    # Specify account
#SBATCH --time=00-06:00        # Specify days-hrs:mins
# SBATCH --nodes=2             # Best to leave commented
#SBATCH --ntasks=8             # Specify number of cores
#SBATCH --mem-per-cpu=16G      # Specify memory per core
#SBATCH --cpus-per-task=1      # Do not change !

module load StdEnv/2020        # Latest installed version
module load abaqus/2021        # Latest installed version

unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
echo "LM_LICENSE_FILE=$LM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"

rm -f testsp1-mpi*

unset hostlist
nodes="$(slurm_hl2hl.py --format MPIHOSTLIST | xargs)"
for i in `echo "$nodes" | xargs -n1 | uniq`; do hostlist=${hostlist}$(echo "['${i}',$(echo "$nodes" | xargs -n1 | grep $i | wc -l)],"); done
hostlist="$(echo "$hostlist" | sed 's/,$//g')"
mphostlist="mp_host_list=[$(echo "$hostlist")]"
export $mphostlist
echo "$mphostlist" > abaqus_v6.env

abaqus job=testsp1-mpi input=mystd-sim.inp \
  scratch=$SCRATCH cpus=$SLURM_NTASKS interactive mp_mode=mpi


Explicit analysis[edit]

Abaqus solvers support thread-based and mpi-based parallelization. Scripts for each type are provided below for running explicit analysis type jobs on single or multiple nodes respectively. Template scripts to perform multinode job restarts are not currently provided pending further testing.

Single node computing[edit]

File : "scriptep1.txt"

#!/bin/bash
#SBATCH --account=def-group    # specify account
#SBATCH --time=00-06:00        # days-hrs:mins
#SBATCH --mem=8000M            # node memory > 5G
#SBATCH --cpus-per-task=4      # number cores > 1
#SBATCH --nodes=1              # do not change

module load abaqus/2021

unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
echo "LM_LICENSE_FILE=$LM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"

rm -f testep1* testep2*
abaqus job=testep1 input=myexp-sim.inp \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"


To write restart data for a total of 12 time increments specify in the input file:

*RESTART, WRITE, OVERLAY, NUMBER INTERVAL=12, TIME MARKS=NO

Check for completed restart information in relevant output files:

egrep -i "step|restart" testep*.com testep*.msg testep*.sta
File : "scriptep2.txt"

#!/bin/bash
#SBATCH --account=def-group    # specify account
#SBATCH --time=00-06:00        # days-hrs:mins
#SBATCH --mem=8000M            # node memory > 5G
#SBATCH --cpus-per-task=4      # number cores > 1
#SBATCH --nodes=1              # do not change

module load abaqus/2021

unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
echo "LM_LICENSE_FILE=$LM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"

rm -f testep2* testep1.lck
for f in testep1*; do [[ -f ${f} ]] && cp -a "$f" "testep2${f#testep1}"; done
abaqus job=testep2 input=myexp-sim.inp recover \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"


No input file modifications are required to restart the analysis.

File : "scriptet1.txt"

#!/bin/bash
#SBATCH --account=def-group    # specify account
#SBATCH --time=00-06:00        # days-hrs:mins
#SBATCH --mem=8000M            # node memory > 5G
#SBATCH --cpus-per-task=4      # number cores > 1
#SBATCH --nodes=1              # do not change

module load abaqus/2021

unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
echo "LM_LICENSE_FILE=$LM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"
echo "SLURM_SUBMIT_DIR =" $SLURM_SUBMIT_DIR
echo "SLURM_TMPDIR = " $SLURM_TMPDIR

rm -f testet1* testet2*
cd $SLURM_TMPDIR
while sleep 6h; do
   cp -f * $SLURM_SUBMIT_DIR 2>/dev/null
done &
WPID=$!
abaqus job=testet1 input=$SLURM_SUBMIT_DIR/myexp-sim.inp \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
{ kill $WPID && wait $WPID; } 2>/dev/null
cp -f * $SLURM_SUBMIT_DIR


To write restart data for a total of 12 time increments specify in the input file:

*RESTART, WRITE, OVERLAY, NUMBER INTERVAL=12, TIME MARKS=NO

Check for completed restart information in relevant output files:

egrep -i "step|restart" testet*.com testet*.msg testet*.sta
File : "scriptet2.txt"

#!/bin/bash
#SBATCH --account=def-group    # specify account
#SBATCH --time=00-06:00        # days-hrs:mins
#SBATCH --mem=8000M            # node memory > 5G
#SBATCH --cpus-per-task=4      # number cores > 1
#SBATCH --nodes=1              # do not change

module load abaqus/2021

unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
echo "LM_LICENSE_FILE=$LM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"
echo "SLURM_SUBMIT_DIR =" $SLURM_SUBMIT_DIR
echo "SLURM_TMPDIR = " $SLURM_TMPDIR

rm -f testet2* testet1.lck
for f in testet1*; do cp -a "$f" $SLURM_TMPDIR/"testet2${f#testet1}"; done
cd $SLURM_TMPDIR
while sleep 3h; do
   cp -f * $SLURM_SUBMIT_DIR 2>/dev/null
done &
WPID=$!
abaqus job=testet2 input=$SLURM_SUBMIT_DIR/myexp-sim.inp recover \
   scratch=$SCRATCH cpus=$SLURM_CPUS_ON_NODE interactive \
   mp_mode=threads memory="$((${SLURM_MEM_PER_NODE}-3072))MB"
{ kill $WPID && wait $WPID; } 2>/dev/null
cp -f  * $SLURM_SUBMIT_DIR


No input file modifications are required to restart the analysis.


Multiple node computing[edit]

File : "scriptep1-mpi.txt"

!/bin/bash
#SBATCH --account=def-group    # Specify account
#SBATCH --time=00-06:00        # Specify days-hrs:mins
#SBATCH --ntasks=8             # Specify number of cores
#SBATCH --mem-per-cpu=16000M   # Specify memory per core
# SBATCH --nodes=2             # Specify number of nodes (optional)
#SBATCH --cpus-per-task=1      # Do not change !

module load StdEnv/2020        # Latest installed version
module load abaqus/2021        # Latest installed version

unset SLURM_GTIDS
export MPI_IC_ORDER='tcp'
# uncomment next line when using abaqus/2021
export I_MPI_HYDRA_TOPOLIB=ipl
echo "LM_LICENSE_FILE=$LM_LICENSE_FILE"
echo "ABAQUSLM_LICENSE_FILE=$ABAQUSLM_LICENSE_FILE"

rm -f testep1-mpi*

unset hostlist
nodes="$(slurm_hl2hl.py --format MPIHOSTLIST | xargs)"
for i in `echo "$nodes" | xargs -n1 | uniq`; do hostlist=${hostlist}$(echo "['${i}',$(echo "$nodes" | xargs -n1 | grep $i | wc -l)],"); done
hostlist="$(echo "$hostlist" | sed 's/,$//g')"
mphostlist="mp_host_list=[$(echo "$hostlist")]"
export $mphostlist
echo "$mphostlist" > abaqus_v6.env

abaqus job=testep1-mpi input=myexp-sim.inp \
  scratch=$SCRATCH cpus=$SLURM_NTASKS interactive mp_mode=mpi


Node memory[edit]

An estimate for the total slurm node memory (--mem=) required for a simulation to run fully in ram (without being virtualized to scratch disk) can be obtained by examining the Abaqus output test.dat file. For example, a simulation that requires a fairly large amount of memory might show:

                   M E M O R Y   E S T I M A T E
  
 PROCESS      FLOATING PT       MINIMUM MEMORY        MEMORY TO
              OPERATIONS           REQUIRED          MINIMIZE I/O
             PER ITERATION           (MB)               (MB)
  
     1          1.89E+14             3612              96345

To run your simulation interactively and monitor the memory consumption, do the following:
1) ssh into a cluster, obtain an allocation on a compute node (such as gra100), run abaqus ie.

[name@server ~]$ module load abaqus/6.14.1  OR  module load abaqus/2020
[name@server ~]$ unset SLURM_GTIDS


2) ssh into the cluster again, ssh into the compute node with the allocation, run top ie.

[name@server ~]$ ssh gra100
[name@server ~]$ top -u $USER


3) watch the VIRT and RES columns until steady peak memory values are observed

To completely satisfy the recommended "MEMORY TO OPERATIONS REQUIRED MINIMIZE I/O" (MRMIO) value, at least the same amount of non-swapped physical memory (RES) must be available to Abaqus. Since the RES will in general be less than the virtual memory (VIRT) by some relatively constant amount for a given simulation, it is necessary to slightly over-allocate the requested Slurm node memory -mem=. In the above sample Slurm script, this over-allocation has been hardcoded to a conservative value of 3072MB based on initial testing of the standard Abaqus solver. To avoid long queue wait times associated with large values of MRMIO, it may be worth investigating the simulation performance impact associated with reducing the RES memory that is made available to Abaqus significantly below the MRMIO. This can be done by lowering the -mem= value which in turn will set an artificially low value of memory= in the Abaqus command (found in the last line of the slurm script). In doing this one should be careful the RES does not dip below the MINIMUM MEMORY REQUIRED (MMR) otherwise Abaqus will exit due to Out of Memory (OOM). As an example, if your MRMIO is 96GB try running a series of short test jobs with #SBATCH --mem=8G, 16G, 32G, 64G until an acceptable minimal performance impact is found, noting that smaller values will result in increasingly larger scratch space used by temporary files.

Graphical use[edit]

Abaqus can be run interactively in graphical mode on a cluster or gra-vdi using VNC by following these steps:

On a cluster[edit]

1. Connect to a compute node (3hr salloc time limit) with TigerVNC
2. Open a new terminal window and enter the following

module load StdEnv/2020 abaqus/2021

3. Start the application with

abaqus cae -mesa

On gra-vdi[edit]

1. Connect to gra-vdi with TigerVNC

2. Open a new terminal window and enter one of the following:

module load CcEnv StdEnv/2016 abaqus/6.14.1 or
module load CcEnv StdEnv/2016 abaqus/2020 or
module load CcEnv StdEnv/2020 abaqus/2021

3. Start the application with

abaqus cae

For abaqus to start in gui mode must be at least one free cae license (not in use). The SHARCNET license has 2 free and 2 reserved licenses. If all 4 are in use according to:

[gra-vdi3:~] abaqus licensing lmstat -c $ABAQUSLM_LICENSE_FILE -a | grep "Users of cae"
Users of cae:  (Total of 4 licenses issued;  Total of 4 licenses in use)

Then the following error messages will occur when you attempt to start abaqus cae:

[gra-vdi3:~] abaqus cae
ABAQUSLM_LICENSE_FILE=27050@license3.sharcnet.ca
/opt/sharcnet/abaqus/2020/Commands/abaqus cae
No socket connection to license server manager.
Feature:       cae
License path:  27050@license3.sharcnet.ca:
FLEXnet Licensing error:-7,96
For further information, refer to the FLEXnet Licensing documentation,
or contact your local Abaqus representative.
Number of requested licenses: 1
Number of total licenses:     4
Number of licenses in use:    2
Number of available licenses: 2
Abaqus Error: Abaqus/CAE Kernel exited with an error.

Site-specific use[edit]

SHARCNET license[edit]

SHARCNET provides a small but free license consisting of 2 cae and 35 execute tokens where usage limits are imposed 10 tokens/user and 15 tokens/group. For groups that have purchased dedicated tokens, the free token usage limits are added to their reservation. The free tokens are available on a first come first serve basis and mainly intended for testing and light usage before deciding whether or not to purchase dedicated tokens. Costs for dedicated tokens (in 2021) were approximately CAD$110 per compute token and CAD$400 per GUI token: submit a ticket to request an official quote. The license can be used by any Alliance researcher, but only on SHARCNET hardware. Groups that purchase dedicated tokens to run on the SHARCNET license server may likewise only use them on SHARCNET hardware including gra-vdi (for running Abaqus in full graphical mode) and Graham or Dusky clusters (for submitting compute batch jobs to the queue). Before you can use the license you must contact Technical support and request access. In your email 1) mention that it is for use on SHARCNET systems and 2) include a copy/paste of the following License Agreement statement with your full name and username entered in the indicated locations. Please note that every user must do this it cannot be done one time only for a group; this includes PIs who have purchased their own dedicated tokens.

License agreement[edit]

----------------------------------------------------------------------------------
Subject: Abaqus SHARCNET Academic License User Agreement

This email is to confirm that i "_____________" with username "___________" will
only use “SIMULIA Academic Software” with tokens from the SHARCNET license server
for the following purposes:

1) on SHARCNET hardware where the software is already installed
2) in affiliation with a Canadian degree-granting academic institution
3) for education, institutional or instruction purposes and not for any commercial
   or contract-related purposes where results are not publishable
4) for experimental, theoretical and/or digital research work, undertaken primarily
   to acquire new knowledge of the underlying foundations of phenomena and observable
   facts, up to the point of proof-of-concept in a laboratory    
-----------------------------------------------------------------------------------

Configure license file[edit]

Configure your license file as follows, noting that it is only usable on SHARCNET systems: Graham, gra-vdi and Dusky.

[gra-login1:~] cat ~/.licenses/abaqus.lic
prepend_path("LM_LICENSE_FILE","27050@license3.sharcnet.ca")
prepend_path("ABAQUSLM_LICENSE_FILE","27050@license3.sharcnet.ca")

If your Abaqus jobs fail with the error message [*** ABAQUS/eliT_CheckLicense rank 0 terminated by signal 11 (Segmentation fault)] in the slurm output file, verify if your abaqus.lic file contains ABAQUSLM_LICENSE_FILE to use abaqus/2020. If your Abaqus jobs fail with an error message starting [License server machine is down or not responding, etc.] in the output file verify your abaqus.lic file contains LM_LICENSE_FILE to use abaqus/6.14.1 as shown. The abaqus.lic file shown contains both so you should not see this problem.

Query license server[edit]

Log into graham, load abaqus and then run one of the following:

ssh graham.alliancecan.ca
module load StdEnv/2020
module load abaqus

I) Check the SHARCNET license server for started and queued jobs:

abaqus licensing lmstat -c $LM_LICENSE_FILE -a | egrep "Users|start|queued"

II) Check the SHARCNET license server for started and queued jobs also showing reservations by purchasing groups:

abaqus licensing lmstat -c $LM_LICENSE_FILE -a | egrep "Users|start|queued|RESERVATION"

III) Check the SHARCNET license server for only cae, standard and explicit product availability:

abaqus licensing lmstat -c $LM_LICENSE_FILE -a | grep "Users of" | egrep "cae|standard|explicit"

When the output of query I) above indicates that a job for a particular username is queued this means the job has entered the "R"unning state from the perspective of squeue -j jobid or sacct -j jobid and is therefore idle on a compute node waiting for a license. This will have the same impact on your account priority as if the job were performing computations and consuming CPU time. Eventually when sufficient licenses come available the queued job will start. To demonstrate, the following shows the license server and queue output for the situation where a user submits two jobs, but only the first job acquires enough licenses to start:

[roberpj@dus241:~] sq
         JOBID     USER      ACCOUNT           NAME  ST  TIME_LEFT  NODES  CPUS  MIN_MEM  NODELIST (REASON) 
         29801  roberpj  def-roberpj  scriptep1.txt   R    2:59:18      1    12       8G   dus47 (None) 
         29802  roberpj  def-roberpj  scriptsp1.txt   R    2:59:33      1    12       8G   dus28 (None) 
[roberpj@dus241:~] abaqus licensing lmstat -c $LM_LICENSE_FILE -a | egrep "Users|start|queued|RESERVATION"
 Users of abaqus:  (Total of 78 licenses issued;  Total of 71 licenses in use)
     roberpj dus47 /dev/tty (v62.2) (license3.sharcnet.ca/27050 275), start Thu 8/27 5:45, 14 licenses
     roberpj dus28 /dev/tty (v62.2) (license3.sharcnet.ca/27050 729) queued for 14 licenses

Specify job resources[edit]

To ensure optimal usage of both your Abaqus tokens and our resources, it's important to carefully specify the required memory and ncpus in your Slurm script. The values can be determined by submitting a few short test jobs to the queue then checking their utilization. For completed jobs use seff JobNumber to show the total Memory Utilized and Memory Efficiency. If the Memory Efficiency is less than ~90%, decrease the value of the #SBATCH --mem= setting in your Slurm script accordingly. Notice that the seff JobNumber command also shows the total CPU (time) Utilized and CPU Efficiency. If the CPU Efficiency is less than ~90%, perform scaling tests to determine the optimal number of CPUs for optimal performance and then update the value of #SBATCH --cpus-per-task= in your Slurm script. For running jobs, use the srun --jobid=29821580 --pty top -d 5 -u $USER command to watch the %CPU, %MEM and RES for each Abaqus parent process on the compute node. The %CPU and %MEM columns display the percent usage relative to the total available on the node while the RES column shows the per process resident memory size (in human readable format for values over 1GB). Further information regarding how to monitor jobs is available on our documentation wiki

Core token mapping[edit]

TOKENS 5  6  7  8  10  12  14  16  19  21  25  28  34  38
CORES  1  2  3  4   6   8  12  16  24  32  48  64  96 128

where TOKENS = floor[5 X CORES^0.422]

Western license[edit]

The Western site license may only be used by Western researchers on hardware located at Western's campus. Currently, the Dusky cluster is the only system that satisfies these conditions. Graham and gra-vdi are excluded since they are located on Waterloo's campus. Contact the Western Abaqus license server administrator <jmilner@robarts.ca> to inquire about using the Western Abaqus license. You will need to provide your username and possibly make arrangements to purchase tokens. If you are granted access then you may proceed to configure your abaqus.lic file to point to the Western license server as follows:

Configure license file[edit]

Configure your license file as follows, noting that it is only usable on Dusky.

[dus241:~] cat .licenses/abaqus.lic
prepend_path("LM_LICENSE_FILE","27000@license4.sharcnet.ca")
prepend_path("ABAQUSLM_LICENSE_FILE","27000@license4.sharcnet.ca")

Once configured, submit your job as described in the Cluster job submission section above. If there are any problems submit a problem ticket to technical support. Specify that you are using the Abaqus Western license on dusky and provide the failed job number along with a paste of any error messages as applicable.

Online documentation[edit]

The full Abaqus documentation (latest version) can be accessed on gra-vdi as shown in the following steps.

Account preparation:

  1. connect to gra-vdi.computecanada.ca with tigervnc as described here
  2. open a terminal window on gra-vdi and type firefox (hit enter)
  3. in the address bar type about:config (hit enter) -> click the I accept the risk! button
  4. in the search bar type unique then double click privacy.file_unique_origin to change true to false

View documentation:

  1. connect to gra-vdi.computecanada.ca with tigervnc as described here
  2. open a terminal window on gra-vdi and type firefox (hit enter)
  3. in the search bar copy paste one of the following:
    file:///opt/sharcnet/abaqus/2020/doc/English/DSSIMULIA_Established.htm, or
    file:///opt/sharcnet/abaqus/2021/doc/English/DSSIMULIA_Established.htm
  4. find a topic by clicking for example: Abaqus -> Analysis -> Analysis Techniques -> Analysis Continuation Techniques