rsnt_translations
56,573
edits
No edit summary |
No edit summary |
||
Line 377: | Line 377: | ||
<!--T:226--> | <!--T:226--> | ||
The following two scripts are provided to automate restarting very large jobs that require more than the typical | The following two scripts are provided to automate restarting very large jobs that require more than the typical seven-day maximum runtime window available on most clusters. Jobs are restarted from the most recent saved time step files. A fundamental requirement is the first time step can be completed within the requested job array time limit (specified at the top of your Slurm script) when starting a simulation from an initialized solution field. It is assumed that a standard fixed time step size is being used. To begin, a working set of sample.cas, sample.dat and sample.jou files must be present. Next edit your sample.jou file to contain <code>/solve/dual-time-iterate 1</code> and <code>/file/auto-save/data-frequency 1</code>. Then create a restart journal file by doing <code>cp sample.jou sample-restart.jou</code> and edit the sample-restart.jou file to contain <code>/file/read-cas-data sample-restart</code> instead of <code>/file/read-cas-data sample</code> and comment out the initialization line with a semicolon for instance <code>;/solve/initialize/initialize-flow</code>. If your 2nd and subsequent time steps are known to run twice as fast (than the initial time step), edit sample-restart.jou to specify <code>/solve/dual-time-iterate 2</code>. By doing this, the solution will only be restarted after two 2 time steps are completed following the initial time step. An output file for each time step will still be saved in the output subdirectory. The value 2 is arbitrary but should be chosen such that the time for 2 steps fits within the job array time limit. Doing this will minimize the number of solution restarts which are computationally expensive. If your first time step performed by sample.jou starts from a converged (previous) solution, choose 1 instead of 2 since likely all time steps will require a similar amount of wall time to complete. Assuming 2 is chosen, the total time of simulation to be completed will be 1*Dt+2*Nrestart*Dt where Nrestart is the number of solution restarts specified in the script. The total number of time steps (and hence the number of output files generated) will therefore be 1+2*Nrestart. The value for the time resource request should be chosen so the initial time step and subsequent time steps will complete comfortably within the Slurm time window specifiable up to a maximum of "#SBATCH --time=07-00:00" days. | ||
<!--T:3400--> | <!--T:3400--> | ||
Line 786: | Line 786: | ||
# [[Ansys#Workbench_3|Start workbench]] using the same ansys module version your project was created with | # [[Ansys#Workbench_3|Start workbench]] using the same ansys module version your project was created with | ||
# Open the project in workbench with <I>File -> Open</I> | # Open the project in workbench with <I>File -> Open</I> | ||
# Right click Setup in main window and select to <I>Clear All Generated Data</I> | # Right click Setup in the main window and select to <I>Clear All Generated Data</I> | ||
# Click <I>File -> Exit</I> in the top menu bar pulldown to exit workbench | # Click <I>File -> Exit</I> in the top menu bar pulldown to exit workbench | ||
# Press the No button in the Ansys Workbench popup when asked <I>The current project has been modified. Do you want to save it ?</I> | # Press the No button in the Ansys Workbench popup when asked <I>The current project has been modified. Do you want to save it ?</I> | ||
Line 792: | Line 792: | ||
<!--T:2845--> | <!--T:2845--> | ||
To avoid writing the solution when a running job successfully completes \remove <code>;Save(Overwrite=True)</code> from the last line of your script. Doing this will make it easier to run multiple test jobs (for scaling purposes when changing ntasks), since the initialized solution will not be overwritten each time. Alternatively, keep a copy of the initialized YOURPROJECT.wbpj file and YOURPROJECT_files | To avoid writing the solution when a running job successfully completes \remove <code>;Save(Overwrite=True)</code> from the last line of your script. Doing this will make it easier to run multiple test jobs (for scaling purposes when changing ntasks), since the initialized solution will not be overwritten each time. Alternatively, keep a copy of the initialized YOURPROJECT.wbpj file and YOURPROJECT_files subdirectory and restore them after the solution is written. For APDL-based simulations submitted under the legacy StdEnv/2016 environment, nodes=1 may be either removed from the script or changed to be greater than 1 to permit computations across multiple nodes. | ||
=== Slurm scripts === <!--T:2814--> | === Slurm scripts === <!--T:2814--> | ||
Line 1,481: | Line 1,481: | ||
<!--T:151--> | <!--T:151--> | ||
After a job completes its "Job Wall-clock time" can be obtained from <code>seff myjobid</code>. Using this value scaling tests can be performed by submitting short test jobs with an increasing number of cores. If the Wall-clock time decreases by ~50% when the number of cores | After a job completes its "Job Wall-clock time" can be obtained from <code>seff myjobid</code>. Using this value scaling tests can be performed by submitting short test jobs with an increasing number of cores. If the Wall-clock time decreases by ~50% when the number of cores is doubled then additional cores may be considered. | ||
= Online documentation = <!--T:8--> | = Online documentation = <!--T:8--> | ||
The full Ansys documentation for versions back to 19.2 can be accessed by following these steps: | The full Ansys documentation for versions back to 19.2 can be accessed by following these steps: | ||
# Connect to <b>gra-vdi.computecanada.ca</b>. with tigervnc as described in [https://docs.computecanada.ca/wiki/VNC#VDI_Nodes VDI Nodes]. | # Connect to <b>gra-vdi.computecanada.ca</b>. with tigervnc as described in [https://docs.computecanada.ca/wiki/VNC#VDI_Nodes VDI Nodes]. | ||
# If the Firefox browser or the Ansys Workbench | # If the Firefox browser or the Ansys Workbench is open, close it now. | ||
# Start Firefox by clicking <I>Applications -> Internet -> Firefox</I>. | # Start Firefox by clicking <I>Applications -> Internet -> Firefox</I>. | ||
# Open a <b><i>new</I></b> terminal window by clicking <I>Applications -> System Tools -> Mate Terminal</I>. | # Open a <b><i>new</I></b> terminal window by clicking <I>Applications -> System Tools -> Mate Terminal</I>. |