cc_staff
653
edits
m (→Job Submission) |
(managing memory) |
||
Line 30: | Line 30: | ||
#!/bin/bash | #!/bin/bash | ||
#SBATCH --cpus-per-task=1 # Number of CPUs | #SBATCH --cpus-per-task=1 # Number of CPUs | ||
#SBATCH --mem-per-cpu= | #SBATCH --mem-per-cpu=4000M # memory per CPU in MB | ||
#SBATCH --time=0-00:30 # time (DD-HH:MM) | #SBATCH --time=0-00:30 # time (DD-HH:MM) | ||
Line 71: | Line 71: | ||
can become even slower when further increasing the number of CPUs. | can become even slower when further increasing the number of CPUs. | ||
=== Memory === | |||
Quantum chemistry calculations are often "memory bound" - that means that | |||
larger molecules at high level of theory need a lot of memory (RAM) and in | |||
fact often much more than is available in a typical computer. Therefore | |||
QM packages like GAMESS will use disk-storage (SCRATCH) to store intermediate | |||
results to free up memory and load them back at a later time. | |||
As even our fastes SCRATCH storage is several oders of magnitues slower | |||
than the memory, one should make sure to assign sufficient memory to GAMESS. | |||
This is a two-step process: | |||
1. First one needs to request memory for the job via the Slurm submission | |||
script. Using <code>--mem-per-cpu=4000M</code> is a reasonable value, | |||
as it's compatible with the memory to CPU core ratio on the base nodes. | |||
Reqesting more than that will either cause the jobs to wait for being | |||
started on a large-memory node or being charged for CPUs it didn't | |||
actually used. | |||
2. In the <code>$SYSTEM group</code> of the input file one needs to define | |||
the <code>MWORDS</code> and <code>MEMDDI</code> options. This will tell | |||
GAMESS how much memory it is allowed to use. | |||
<code>MWORDS</code> is the maximum replicated memory which a job can use, | |||
on every core. This is given in units of 1,000,000 words (as opposed | |||
to 1024*1024 words), where a word is defined as 64 bits. | |||
<code>MEMDDI</code> is the grand total memory needed for the distributed | |||
data interface (DDI) storage, given in units of 1,000,000 words. | |||
The memory required on each processor core for a run using p CPU-cores | |||
is therefore MEMDDI/p + MWORDS. Please refer to the <code>$SYSTEM group</code> | |||
section in the GAMESS documentation<ref name="gamess-input">. | |||
It is important to leave a few hundred MB of memory between the memory | |||
requested from the scheduler and the memory that GAMESS is allowed to use, | |||
as a safety margin. If the <code>slurm-{JOBID}.out</code> file contains a | |||
message like "slurmstepd: error: Exceeded step/job memory limit at some point", | |||
then Slurm has terminated the job for trying to use more memory than was | |||
requested for the job. In that case one needs to either reduce the | |||
<code>MWORDS</code> or <code>MEMDDI</code> in the input file or increase | |||
the <code>--mem-per-cpu</code> in the submission script. | |||
== References == | == References == | ||
<references /> | <references /> |