GAMESS-US: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
 
(19 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{Draft}}
<languages />
[[Category:Software]]


<translate>
<!--T:1-->
[[Category:Software]][[Category:ComputationalChemistry]]
<!--T:2-->
The General Atomic and Molecular Electronic Structure System (GAMESS)
The General Atomic and Molecular Electronic Structure System (GAMESS)
<ref name="homepage">GAMESS Homepage: http://www.msg.ameslab.gov/gamess/</ref>  
<ref name="homepage">GAMESS Homepage: http://www.msg.ameslab.gov/gamess/</ref>  
is a general ''ab initio'' quantum chemistry package.
is a general ''ab initio'' quantum chemistry package.


== Running GAMESS == <!--T:3-->


== Running GAMESS ==
=== Job submission === <!--T:4-->
 
Compute Canada clusters use the Slurm scheduler. For more about submitting and monitoring jobs,  
=== Job Submission ===
Compute Canada clusters use the Slurm scheduler; for details about submitting jobs,  
see [[Running jobs]].
see [[Running jobs]].


First step is to prepare a GAMESS input file containing the molecular geometry  
<!--T:5-->
and run parameters.  Please refer to the GAMESS Documentation
The first step is to prepare a GAMESS input file containing the molecular geometry and a specification of the calculation to be carried out.  Please refer to the GAMESS Documentation <ref name="gamess-docs">Documentation: http://www.msg.ameslab.gov/gamess/documentation.html</ref>
<ref name="gamess-docs">Documentation: http://www.msg.ameslab.gov/gamess/documentation.html</ref>
and particularly Chapter 2 "Input Description"<ref name="gamess-input">Input Description (PDF): http://www.msg.ameslab.gov/gamess/GAMESS_Manual/input.pdf</ref> for a description the file format and keywords.
and particularly Chapter 2 "Input Description"<ref name="gamess-input">Input  
Description (PDF): http://www.msg.ameslab.gov/gamess/GAMESS_Manual/input.pdf</ref>
for a description the file format and keywords.


Besides your input file (in our example, "name.inp"), you have to prepare a job  
<!--T:6-->
script to define the compute resources for the job. Input file and job  
Besides your input file (in our example, "name.inp"), you have to prepare a job script to define the compute resources for the job. Input file and job script must be in the same directory.
script must be in the same directory.


</translate>
{{File
{{File
   |name=gamess_job.sh
   |name=gamess_job.sh
Line 33: Line 33:
#SBATCH --time=0-00:30          # time (DD-HH:MM)
#SBATCH --time=0-00:30          # time (DD-HH:MM)


export SLURM_CPUS_PER_TASK
## Directory for GAMESS supplementary output files ($USERSCR):
## uncomment the following two lines to use network $SCRATCH
#export USERSCR=$SCRATCH
#export USRSCR="$SCRATCH/gamess_${SLURM_JOB_ID}/"
 
#mkdir -p $USRSCR
## Directory for GAMESS temporary binary files ($SCR):
## Uncomment the following two lines to use /scratch instead of local disk
#export SCR="$SCRATCH/gamess_${SLURM_JOB_ID}/"
#mkdir -p $SCR


module load gamess-us/20170420-R1
module load gamess-us/20170420-R1
 
export SLURM_CPUS_PER_TASK      # rungms will use this
rungms name.inp  &>  name.out
rungms name.inp  &>  name.out
}}
}}
<translate>


<!--T:10-->
Use the following command to submit the job to the scheduler:
Use the following command to submit the job to the scheduler:


  sbatch gamess_job.sh
<!--T:11-->
  sbatch gamess_job.sh


=== Running GAMESS on multiple CPUs ===
=== Scratch files === <!--T:12-->


<!--T:13-->
By default, temporary binary files (scratch files) will be written to local disk on the compute node (<code>$SLURM_TMPDIR</code>) as we expect this to give the best performance.
Please be aware that the data in <code>$SLURM_TMPDIR</code> will be '''deleted''' after the job finishes.
If there is insufficient space on local disk, you can use /scratch instead by setting the <code>SCR</code> environment variable as shown in the comments in the example above.
<!--T:27-->
Supplementary output files are written to a location defined by the <code>USERSCR</code> environment variable. By default this is the user's <code>$SCRATCH</code> directory.
<!--T:28-->
{| class="wikitable"
|-
! Description                        !! Environment Variable !! Default location
|-
| GAMESS temporary binary files      || <code>SCR</code>    || <code>$SLURM_TMPDIR</code> (node-local storage)
|-
| GAMESS supplementary output files  || <code>USERSCR</code> || <code>$SCRATCH</code> (user's SCRATCH directory)
|}
=== Running GAMESS on multiple CPUs === <!--T:14-->
<!--T:15-->
GAMESS calculations can make use of more than one CPU. The number of CPUs  
GAMESS calculations can make use of more than one CPU. The number of CPUs  
used for a calculation is controlled by the <code>--cpus-per-task</code>  
available for a calculation is determined by the <code>--cpus-per-task</code>  
setting in the submission script.  
setting in the job script.  
As GAMESS has been built using the "sockets" parallelization, it can only
 
use CPU cores that are located on the same compute node and therefore
<!--T:16-->
As GAMESS has been built using [https://en.wikipedia.org/wiki/Unix_domain_socket sockets] for parallelization, it can only
use CPU cores that are located on the same compute node. Therefore
the maximum number of CPU cores that can be used for a job is dictated by
the maximum number of CPU cores that can be used for a job is dictated by
the node configuraion of the system (e.g. 32 CPU cores per node on Graham).
the size of the nodes in the cluster, e.g. 32 CPU cores per node on [[Graham]].


Quantum chemistry calculations are known to not scale as well across
<!--T:17-->
many CPUs as compared to e.g. classical molecular mechanics, which means  
Quantum chemistry calculations are known to not scale well to large numbers
of CPUs as compared to e.g. classical molecular mechanics, which means  
that they can't use large numbers of CPUs efficiently.  Exactly how many
that they can't use large numbers of CPUs efficiently.  Exactly how many
CPUs can be utilized efficiently, depends on the size of a system (i.e.
CPUs can be used efficiently depends on the number of atoms, the number of  
number of atoms, number of basis functions and level of theory).
basis functions, and the level of theory.


<!--T:18-->
To determine a reasonable number of CPUs to use, one needs to run a scaling  
To determine a reasonable number of CPUs to use, one needs to run a scaling  
test - that is running the same input file using different numbers of CPUs  
test--- That is, run the same input file using different numbers of CPUs  
and comparing the execution time.  Ideally the exection time should be half  
and compare the execution times.  Ideally the execution time should be half  
as long when using twice as many CPUs (= 100% speedup).
as long when using twice as many CPUs. Obviously it is not a good use of  
Obviously it is not a good use of resources when a calculation runs only 30%
resources if a calculation runs (for example) only 30%
faster when doubling the number of CPUs and in extreme cases calculations  
faster when the number of CPUs is doubled. It is even possible for certain
can become even slower when further increasing the number of CPUs.
calculations to run slower when increasing the number of CPUs.


=== Memory ===
=== Memory === <!--T:19-->


Quantum chemistry calculations are often "memory bound" - that means that
<!--T:20-->
larger molecules at high level of theory need a lot of memory (RAM) and in
Quantum chemistry calculations are often "memory bound"--- meaning that
fact often much more than is available in a typical computer.  Therefore
larger molecules at high level of theory need a lot of memory (RAM),
QM packages like GAMESS will use disk-storage (SCRATCH) to store intermediate
often more than is available in a typical computer.  Therefore
results to free up memory and load them back at a later time.
packages like GAMESS use disk storage (SCRATCH) to store intermediate
results to free up memory, reading them back from disk later in the calculation.


As even our fastes SCRATCH storage is several oders of magnitues slower
<!--T:21-->
than the memory, one should make sure to assign sufficient memory to GAMESS.
Even our fastest SCRATCH storage is several orders of magnitudes slower
than the memory, so you should make sure to assign sufficient memory to GAMESS.
This is a two-step process:  
This is a two-step process:  


First one needs to request memory for the job via the Slurm submission
<!--T:22-->
script. Using <code>--mem-per-cpu=4000M</code> is a reasonable value,
1. Request memory for the job in the submission script. Using <code>--mem-per-cpu=4000M</code> is a reasonable value, since it matches the memory-to-CPU ratio on the base nodes. Requesting more than that may cause the job to wait to be scheduled on a large-memory node.
as it's compatible with the memory to CPU core ratio on the base nodes.
Reqesting more than that will either cause the jobs to wait for being
started on a large-memory node or being charged for CPUs it didn't
actually used.


Next, in the <code>$SYSTEM group</code> of the input file one needs to define
<!--T:23-->
the <code>MWORDS</code> and <code>MEMDDI</code> options.  This will tell
2. In the <code>$SYSTEM</code> group of the input file, define the <code>MWORDS</code> and <code>MEMDDI</code> options.  This tells GAMESS how much memory it is allowed to use.  
GAMESS how much memory it is allowed to use.
* <code>MWORDS</code> is the maximum replicated memory which a job can use, on every core. This is given in units of 1,000,000 words (as opposed to 1024*1024 words), and a word is defined as 64 bits = 8 bytes.  
<code>MWORDS</code> is the maximum replicated memory which a job can use,  
* <code>MEMDDI</code> is the grand total memory needed for the distributed data interface (DDI) storage, given in units of 1,000,000 words. The memory required on each processor core for a run using <tt>p</tt> CPU-cores is therefore <tt>MEMDDI/p + MWORDS</tt>.  
on every core. This is given in units of 1,000,000 words (as opposed  
to 1024*1024 words), where a word is defined as 64 bits.
<code>MEMDDI</code> is the grand total memory needed for the distributed
data interface (DDI) storage, given in units of 1,000,000 words.
The memory required on each processor core for a run using p CPU-cores
is therefore MEMDDI/p + MWORDS. Please refer to the <code>$SYSTEM group</code>
section in the GAMESS documentation<ref name="gamess-input" />.


It is important to leave a few hundred MB of memory between the memory
<!--T:24-->
requested from the scheduler and the memory that GAMESS is allowed to use,
Please refer to the <code>$SYSTEM group</code> section in the GAMESS documentation<ref name="gamess-input" /> if you want more details.
as a safety margin.  If the <code>slurm-{JOBID}.out</code> file contains a
message like "slurmstepd: error: Exceeded step/job memory limit at some point",
then Slurm has terminated the job for trying to use more memory than was
requested for the job.  In that case one needs to either reduce the  
<code>MWORDS</code> or <code>MEMDDI</code> in the input file or increase
the <code>--mem-per-cpu</code> in the submission script.


== References ==
<!--T:25-->
It is important to leave a few hundred MB of memory between the memory requested from the scheduler and the memory that GAMESS is allowed to use, as a safety margin.  If a job's
output is incomplete and the <code>slurm-{JOBID}.out</code> file contains a message like "slurmstepd: error: Exceeded step/job memory limit at some point",then Slurm has terminated the job for trying to use more memory than was requested. In that case one needs to either reduce the <code>MWORDS</code> or <code>MEMDDI</code> in the input file or increase the <code>--mem-per-cpu</code> in the submission script.
 
== References == <!--T:26-->
<references />
<references />
</translate>

Latest revision as of 18:05, 11 July 2024

Other languages:

The General Atomic and Molecular Electronic Structure System (GAMESS) [1] is a general ab initio quantum chemistry package.

Running GAMESS

Job submission

Compute Canada clusters use the Slurm scheduler. For more about submitting and monitoring jobs, see Running jobs.

The first step is to prepare a GAMESS input file containing the molecular geometry and a specification of the calculation to be carried out. Please refer to the GAMESS Documentation [2] and particularly Chapter 2 "Input Description"[3] for a description the file format and keywords.

Besides your input file (in our example, "name.inp"), you have to prepare a job script to define the compute resources for the job. Input file and job script must be in the same directory.


File : gamess_job.sh

#!/bin/bash
#SBATCH --cpus-per-task=1       # Number of CPUs
#SBATCH --mem-per-cpu=4000M     # memory per CPU in MB
#SBATCH --time=0-00:30          # time (DD-HH:MM)

## Directory for GAMESS supplementary output files ($USERSCR):
#export USERSCR=$SCRATCH

## Directory for GAMESS temporary binary files ($SCR):
## Uncomment the following two lines to use /scratch instead of local disk
#export SCR="$SCRATCH/gamess_${SLURM_JOB_ID}/"
#mkdir -p $SCR

module load gamess-us/20170420-R1
export SLURM_CPUS_PER_TASK      # rungms will use this
rungms name.inp  &>  name.out


Use the following command to submit the job to the scheduler:

  sbatch gamess_job.sh

Scratch files

By default, temporary binary files (scratch files) will be written to local disk on the compute node ($SLURM_TMPDIR) as we expect this to give the best performance. Please be aware that the data in $SLURM_TMPDIR will be deleted after the job finishes. If there is insufficient space on local disk, you can use /scratch instead by setting the SCR environment variable as shown in the comments in the example above.

Supplementary output files are written to a location defined by the USERSCR environment variable. By default this is the user's $SCRATCH directory.

Description Environment Variable Default location
GAMESS temporary binary files SCR $SLURM_TMPDIR (node-local storage)
GAMESS supplementary output files USERSCR $SCRATCH (user's SCRATCH directory)

Running GAMESS on multiple CPUs

GAMESS calculations can make use of more than one CPU. The number of CPUs available for a calculation is determined by the --cpus-per-task setting in the job script.

As GAMESS has been built using sockets for parallelization, it can only use CPU cores that are located on the same compute node. Therefore the maximum number of CPU cores that can be used for a job is dictated by the size of the nodes in the cluster, e.g. 32 CPU cores per node on Graham.

Quantum chemistry calculations are known to not scale well to large numbers of CPUs as compared to e.g. classical molecular mechanics, which means that they can't use large numbers of CPUs efficiently. Exactly how many CPUs can be used efficiently depends on the number of atoms, the number of basis functions, and the level of theory.

To determine a reasonable number of CPUs to use, one needs to run a scaling test--- That is, run the same input file using different numbers of CPUs and compare the execution times. Ideally the execution time should be half as long when using twice as many CPUs. Obviously it is not a good use of resources if a calculation runs (for example) only 30% faster when the number of CPUs is doubled. It is even possible for certain calculations to run slower when increasing the number of CPUs.

Memory

Quantum chemistry calculations are often "memory bound"--- meaning that larger molecules at high level of theory need a lot of memory (RAM), often more than is available in a typical computer. Therefore packages like GAMESS use disk storage (SCRATCH) to store intermediate results to free up memory, reading them back from disk later in the calculation.

Even our fastest SCRATCH storage is several orders of magnitudes slower than the memory, so you should make sure to assign sufficient memory to GAMESS. This is a two-step process:

1. Request memory for the job in the submission script. Using --mem-per-cpu=4000M is a reasonable value, since it matches the memory-to-CPU ratio on the base nodes. Requesting more than that may cause the job to wait to be scheduled on a large-memory node.

2. In the $SYSTEM group of the input file, define the MWORDS and MEMDDI options. This tells GAMESS how much memory it is allowed to use.

  • MWORDS is the maximum replicated memory which a job can use, on every core. This is given in units of 1,000,000 words (as opposed to 1024*1024 words), and a word is defined as 64 bits = 8 bytes.
  • MEMDDI is the grand total memory needed for the distributed data interface (DDI) storage, given in units of 1,000,000 words. The memory required on each processor core for a run using p CPU-cores is therefore MEMDDI/p + MWORDS.

Please refer to the $SYSTEM group section in the GAMESS documentation[3] if you want more details.

It is important to leave a few hundred MB of memory between the memory requested from the scheduler and the memory that GAMESS is allowed to use, as a safety margin. If a job's output is incomplete and the slurm-{JOBID}.out file contains a message like "slurmstepd: error: Exceeded step/job memory limit at some point",then Slurm has terminated the job for trying to use more memory than was requested. In that case one needs to either reduce the MWORDS or MEMDDI in the input file or increase the --mem-per-cpu in the submission script.

References