AlphaFold: Difference between revisions
(Major rework of Alphafold page to support new patched wheels. Added translation tags.) |
(Marked this version for translation) |
||
Line 1: | Line 1: | ||
<translate> | <translate> | ||
<!--T:1--> | |||
[https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology AlphaFold] | [https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology AlphaFold] | ||
is a machine-learning model for the prediction of protein folding. | is a machine-learning model for the prediction of protein folding. | ||
<!--T:2--> | |||
This page discusses how to use AlphaFold v2.0, the version that was entered in CASP14 and published in Nature. | This page discusses how to use AlphaFold v2.0, the version that was entered in CASP14 and published in Nature. | ||
<!--T:3--> | |||
Source code and documentation for AlphaFold can be found at their [https://github.com/deepmind/alphafold GitHub page]. | Source code and documentation for AlphaFold can be found at their [https://github.com/deepmind/alphafold GitHub page]. | ||
Any publication that discloses findings arising from using this source code or the model parameters should [https://github.com/deepmind/alphafold#citing-this-work cite] the [https://doi.org/10.1038/s41586-021-03819-2 AlphaFold paper]. | Any publication that discloses findings arising from using this source code or the model parameters should [https://github.com/deepmind/alphafold#citing-this-work cite] the [https://doi.org/10.1038/s41586-021-03819-2 AlphaFold paper]. | ||
== Using Python wheel == | == Using Python wheel == <!--T:4--> | ||
=== Available wheels === | === Available wheels === <!--T:5--> | ||
You can list available wheels using the <tt>avail_wheels</tt> command: | You can list available wheels using the <tt>avail_wheels</tt> command: | ||
{{Command | {{Command | ||
Line 20: | Line 23: | ||
}} | }} | ||
=== Installing AlphaFold in a Python virtual environment === | === Installing AlphaFold in a Python virtual environment === <!--T:6--> | ||
<!--T:7--> | |||
1. Load AlphaFold dependencies: | 1. Load AlphaFold dependencies: | ||
{{Command|module load gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8 | {{Command|module load gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8 | ||
Line 28: | Line 32: | ||
<!--T:8--> | |||
2. Create a Python virtual environment and activate it: | 2. Create a Python virtual environment and activate it: | ||
{{Commands2 | {{Commands2 | ||
Line 34: | Line 39: | ||
}} | }} | ||
<!--T:9--> | |||
3. Install a specific version of AlphaFold and its python dependencies: | 3. Install a specific version of AlphaFold and its python dependencies: | ||
{{Commands2 | {{Commands2 | ||
Line 41: | Line 47: | ||
}} | }} | ||
<!--T:10--> | |||
4. Validate it | 4. Validate it | ||
{{Command | {{Command | ||
Line 47: | Line 54: | ||
}} | }} | ||
=== Databases === | === Databases === <!--T:11--> | ||
Note that AlphaFold requires a set of datasets/databases to be downloaded into the <tt>$SCRATCH</tt>. | Note that AlphaFold requires a set of datasets/databases to be downloaded into the <tt>$SCRATCH</tt>. | ||
<!--T:12--> | |||
'''Important:''' The database must live in the <tt>$SCRATCH</tt> | '''Important:''' The database must live in the <tt>$SCRATCH</tt> | ||
<!--T:13--> | |||
<tabs> | <tabs> | ||
<tab name="General"> | <tab name="General"> | ||
Line 61: | Line 70: | ||
}} | }} | ||
<!--T:14--> | |||
2. With your virtual environment activated, you can download the data: | 2. With your virtual environment activated, you can download the data: | ||
{{Command | {{Command | ||
Line 67: | Line 77: | ||
}} | }} | ||
<!--T:15--> | |||
Note that this step '''cannot''' be done from compute nodes but rather from a login node. Since the download might take a while we suggest to start the download in a [https://linuxize.com/post/how-to-use-linux-screen/ screen] or [https://docs.computecanada.ca/wiki/Tmux Tmux] session. | Note that this step '''cannot''' be done from compute nodes but rather from a login node. Since the download might take a while we suggest to start the download in a [https://linuxize.com/post/how-to-use-linux-screen/ screen] or [https://docs.computecanada.ca/wiki/Tmux Tmux] session. | ||
</tab> | </tab> | ||
<!--T:16--> | |||
<tab name="Graham only"> | <tab name="Graham only"> | ||
1. Set <tt>DOWNLOAD_DIR</tt>: | 1. Set <tt>DOWNLOAD_DIR</tt>: | ||
Line 79: | Line 91: | ||
</tabs> | </tabs> | ||
<!--T:17--> | |||
Afterwards, the structure of your data should be similar to: | Afterwards, the structure of your data should be similar to: | ||
{{Command | {{Command | ||
Line 107: | Line 120: | ||
}} | }} | ||
=== Running AlphaFold === | === Running AlphaFold === <!--T:18--> | ||
{{Warning | {{Warning | ||
|title=Performance | |title=Performance | ||
Line 113: | Line 126: | ||
}} | }} | ||
<!--T:19--> | |||
Edit to your needs the following submission script: | Edit to your needs the following submission script: | ||
<tabs> | <tabs> | ||
Line 122: | Line 136: | ||
#!/bin/bash | #!/bin/bash | ||
<!--T:20--> | |||
#SBATCH --job-name=alphafold_run | #SBATCH --job-name=alphafold_run | ||
#SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs | #SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs | ||
Line 128: | Line 143: | ||
#SBATCH --mem=20G # adjust this according to the memory you need | #SBATCH --mem=20G # adjust this according to the memory you need | ||
<!--T:21--> | |||
# Load modules dependencies | # Load modules dependencies | ||
module load gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8 | module load gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8 | ||
<!--T:22--> | |||
DOWNLOAD_DIR=$SCRATCH/alphafold/data # set the appropriate path to your downloaded data | DOWNLOAD_DIR=$SCRATCH/alphafold/data # set the appropriate path to your downloaded data | ||
INPUT_DIR=$SCRATCH/alphafold/input # set the appropriate path to your supporting data | INPUT_DIR=$SCRATCH/alphafold/input # set the appropriate path to your supporting data | ||
OUTPUT_DIR=${SCRATCH}/alphafold/output # set the appropriate path to your supporting data | OUTPUT_DIR=${SCRATCH}/alphafold/output # set the appropriate path to your supporting data | ||
<!--T:23--> | |||
# Generate your virtual environment in $SLURM_TMPDIR | # Generate your virtual environment in $SLURM_TMPDIR | ||
virtualenv --no-download ${SLURM_TMPDIR}/env | virtualenv --no-download ${SLURM_TMPDIR}/env | ||
source ${SLURM_TMPDIR}/env/bin/activate | source ${SLURM_TMPDIR}/env/bin/activate | ||
<!--T:24--> | |||
# Install alphafold and its dependencies | # Install alphafold and its dependencies | ||
pip install --no-index --upgrade pip | pip install --no-index --upgrade pip | ||
pip install --no-index alphafold==2.2.2 | pip install --no-index alphafold==2.2.2 | ||
<!--T:25--> | |||
# Edit with the proper arguments, run your commands | # Edit with the proper arguments, run your commands | ||
# run_alphafold.py --help | # run_alphafold.py --help | ||
Line 165: | Line 185: | ||
</tab> | </tab> | ||
<!--T:26--> | |||
<tab name="GPU"> | <tab name="GPU"> | ||
{{File | {{File | ||
Line 172: | Line 193: | ||
#!/bin/bash | #!/bin/bash | ||
<!--T:27--> | |||
#SBATCH --job-name=alphafold_run | #SBATCH --job-name=alphafold_run | ||
#SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs | #SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs | ||
Line 179: | Line 201: | ||
#SBATCH --mem=20G # adjust this according to the memory you need | #SBATCH --mem=20G # adjust this according to the memory you need | ||
<!--T:28--> | |||
# Load modules dependencies | # Load modules dependencies | ||
module load gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8 | module load gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8 | ||
<!--T:29--> | |||
DOWNLOAD_DIR=$SCRATCH/alphafold/data # set the appropriate path to your downloaded data | DOWNLOAD_DIR=$SCRATCH/alphafold/data # set the appropriate path to your downloaded data | ||
INPUT_DIR=$SCRATCH/alphafold/input # set the appropriate path to your supporting data | INPUT_DIR=$SCRATCH/alphafold/input # set the appropriate path to your supporting data | ||
OUTPUT_DIR=${SCRATCH}/alphafold/output # set the appropriate path to your supporting data | OUTPUT_DIR=${SCRATCH}/alphafold/output # set the appropriate path to your supporting data | ||
<!--T:30--> | |||
# Generate your virtual environment in $SLURM_TMPDIR | # Generate your virtual environment in $SLURM_TMPDIR | ||
virtualenv --no-download ${SLURM_TMPDIR}/env | virtualenv --no-download ${SLURM_TMPDIR}/env | ||
source ${SLURM_TMPDIR}/env/bin/activate | source ${SLURM_TMPDIR}/env/bin/activate | ||
<!--T:31--> | |||
# Install alphafold and its dependencies | # Install alphafold and its dependencies | ||
pip install --no-index --upgrade pip | pip install --no-index --upgrade pip | ||
pip install --no-index alphafold==2.2.2 | pip install --no-index alphafold==2.2.2 | ||
<!--T:32--> | |||
# Edit with the proper arguments, run your commands | # Edit with the proper arguments, run your commands | ||
# run_alphafold.py --help | # run_alphafold.py --help | ||
Line 218: | Line 245: | ||
</tabs> | </tabs> | ||
<!--T:33--> | |||
Then submit the job to the scheduler: | Then submit the job to the scheduler: | ||
{{Command | {{Command | ||
Line 224: | Line 252: | ||
}} | }} | ||
== Using Singularity == | == Using Singularity == <!--T:34--> | ||
AlphaFold documentation explains how to run the software using Docker. We do not provide Docker, but instead provide [[Singularity]]. It is recommended to use a virtual environment and a Python wheel available from the Compute Canada "wheelhouse". | AlphaFold documentation explains how to run the software using Docker. We do not provide Docker, but instead provide [[Singularity]]. It is recommended to use a virtual environment and a Python wheel available from the Compute Canada "wheelhouse". | ||
<!--T:35--> | |||
First read our [[Singularity]] documentation as there are particularities of each cluster that one must take into account. Then [[Singularity#Creating_images_on_Compute_Canada_clusters| build a Singularity container]]: | First read our [[Singularity]] documentation as there are particularities of each cluster that one must take into account. Then [[Singularity#Creating_images_on_Compute_Canada_clusters| build a Singularity container]]: | ||
{{Commands2 | {{Commands2 | ||
Line 234: | Line 263: | ||
}} | }} | ||
=== Running AlphaFold within Singularity === | === Running AlphaFold within Singularity === <!--T:36--> | ||
{{Warning | {{Warning | ||
|title=Performance | |title=Performance | ||
Line 244: | Line 273: | ||
}} | }} | ||
<!--T:37--> | |||
Then edit the job submission script: | Then edit the job submission script: | ||
{{File | {{File | ||
Line 251: | Line 281: | ||
#!/bin/bash | #!/bin/bash | ||
<!--T:38--> | |||
#SBATCH --job-name alphafold-run | #SBATCH --job-name alphafold-run | ||
#SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs | #SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs | ||
Line 258: | Line 289: | ||
#SBATCH --mem=20G # adjust this according to the memory you need | #SBATCH --mem=20G # adjust this according to the memory you need | ||
<!--T:39--> | |||
module load singularity | module load singularity | ||
<!--T:40--> | |||
export PYTHONNOUSERSITE=True | export PYTHONNOUSERSITE=True | ||
<!--T:41--> | |||
ALPHAFOLD_DATA_PATH=/path/to/alphafold/databases | ALPHAFOLD_DATA_PATH=/path/to/alphafold/databases | ||
ALPHAFOLD_MODELS=/path/to/alphafold/databases/params | ALPHAFOLD_MODELS=/path/to/alphafold/databases/params | ||
<!--T:42--> | |||
# Run the command | # Run the command | ||
singularity run --nv \ | singularity run --nv \ | ||
Line 289: | Line 324: | ||
Memory requirements will vary with different size proteins. | Memory requirements will vary with different size proteins. | ||
<!--T:43--> | |||
Bind-mount the current working directory to <tt>/etc</tt> inside the container for the cache file ld.so.cache [-B .:/etc]. The <tt>--nv</tt> flag is used to enable the GPU support. | Bind-mount the current working directory to <tt>/etc</tt> inside the container for the cache file ld.so.cache [-B .:/etc]. The <tt>--nv</tt> flag is used to enable the GPU support. | ||
Submit this job script ('alpharun_jobscript.sh') using the Slurm sbatch command. | Submit this job script ('alpharun_jobscript.sh') using the Slurm sbatch command. | ||
Line 295: | Line 331: | ||
}} | }} | ||
<!--T:44--> | |||
On successful completion, the output directory should have the following files: | On successful completion, the output directory should have the following files: | ||
{{Command | {{Command |
Revision as of 14:43, 28 June 2022
AlphaFold is a machine-learning model for the prediction of protein folding.
This page discusses how to use AlphaFold v2.0, the version that was entered in CASP14 and published in Nature.
Source code and documentation for AlphaFold can be found at their GitHub page. Any publication that discloses findings arising from using this source code or the model parameters should cite the AlphaFold paper.
Using Python wheel
Available wheels
You can list available wheels using the avail_wheels command:
[name@server ~]$ avail_wheels alphafold
name version python arch
--------- --------- -------- -------
alphafold 2.2.2 py3 generic
Installing AlphaFold in a Python virtual environment
1. Load AlphaFold dependencies:
[name@server ~]$ module load gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8
Only python 3.7 and 3.8 are currently supported.
2. Create a Python virtual environment and activate it:
[name@server ~]$ virtualenv --no-download ~/alphafold_env
[name@server ~]$ source ~/alphafold_env/bin/activate
3. Install a specific version of AlphaFold and its python dependencies:
(alphafold_env) [name@server ~] pip install --no-index --upgrade pip
(alphafold_env) [name@server ~] pip install --no-index alphafold==2.2.2
4. Validate it
(alphafold_env) [name@server ~] run_alphafold.py --help
Databases
Note that AlphaFold requires a set of datasets/databases to be downloaded into the $SCRATCH.
Important: The database must live in the $SCRATCH
1. From a login node, create the data folder:
(alphafold_env) [name@server ~] export DOWNLOAD_DIR=$SCRATCH/alphafold/data
(alphafold_env) [name@server ~] mkdir -p $DOWNLOAD_DIR
2. With your virtual environment activated, you can download the data:
(alphafold_env) [name@server ~] download_all_data.sh $DOWNLOAD_DIR
Note that this step cannot be done from compute nodes but rather from a login node. Since the download might take a while we suggest to start the download in a screen or Tmux session.
Afterwards, the structure of your data should be similar to:
(alphafold_env) [name@server ~] tree -d $DOWNLOAD_DIR
$DOWNLOAD_DIR/ # Total: ~ 2.2 TB (download: 428 GB)
bfd/ # ~ 1.8 TB (download: 271.6 GB)
# 6 files.
mgnify/ # ~ 64 GB (download: 32.9 GB)
mgy_clusters.fa
params/ # ~ 3.5 GB (download: 3.5 GB)
# 5 CASP14 models,
# 5 pTM models,
# LICENSE,
# = 11 files.
pdb70/ # ~ 56 GB (download: 19.5 GB)
# 9 files.
pdb_mmcif/ # ~ 206 GB (download: 46 GB)
mmcif_files/
# About 180,000 .cif files.
obsolete.dat
uniclust30/ # ~ 87 GB (download: 24.9 GB)
uniclust30_2018_08/
# 13 files.
uniref90/ # ~ 59 GB (download: 29.7 GB)
uniref90.fasta
Running AlphaFold
AlphaFold has at most 8 cpus hardcoded since it does not benefit from using more than 8.
Edit to your needs the following submission script:
#!/bin/bash
#SBATCH --job-name=alphafold_run
#SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=08:00:00 # adjust this to match the walltime of your job
#SBATCH --cpus-per-task=8 # a MAXIMUM of 8 core, Alpafold has no benefit to use more
#SBATCH --mem=20G # adjust this according to the memory you need
# Load modules dependencies
module load gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8
DOWNLOAD_DIR=$SCRATCH/alphafold/data # set the appropriate path to your downloaded data
INPUT_DIR=$SCRATCH/alphafold/input # set the appropriate path to your supporting data
OUTPUT_DIR=${SCRATCH}/alphafold/output # set the appropriate path to your supporting data
# Generate your virtual environment in $SLURM_TMPDIR
virtualenv --no-download ${SLURM_TMPDIR}/env
source ${SLURM_TMPDIR}/env/bin/activate
# Install alphafold and its dependencies
pip install --no-index --upgrade pip
pip install --no-index alphafold==2.2.2
# Edit with the proper arguments, run your commands
# run_alphafold.py --help
run_alphafold.py \
--data_dir=${DOWNLOAD_DIR} \
--fasta_paths=${INPUT_DIR}/YourSequence.fasta,${INPUT_DIR}/AnotherSequence.fasta \
--bfd_database_path=${DOWNLOAD_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb70_database_path=${DOWNLOAD_DIR}/pdb70/pdb70 \
--template_mmcif_dir=${DOWNLOAD_DIR}/pdb_mmcif/mmcif_files \
--uniclust30_database_path=${DOWNLOAD_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--uniref90_database_path=${DOWNLOAD_DIR}/uniref90/uniref90.fasta \
--hhblits_binary_path=${EBROOTHHMINSUITE}/bin/hhblits \
--hhsearch_binary_path=${EBROOTHHMINSUITE}/bin/hhsearch \
--jackhmmer_binary_path=${EBROOTHMMER}/bin/jackhmmer \
--kalign_binary_path=${EBROOTKALIGN}/bin/kalign \
--mgnify_database_path=${DOWNLOAD_DIR}/mgnify/mgy_clusters_2018_12.fa \
--output_dir=${OUTPUT_DIR} \
--obsolete_pdbs_path=${DOWNLOAD_DIR}/pdb_mmcif/obsolete.dat \
--max_template_date=2020-05-14 \
--model_preset=monomer_casp14
#!/bin/bash
#SBATCH --job-name=alphafold_run
#SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=08:00:00 # adjust this to match the walltime of your job
#SBATCH --gres=gpu:1 # a GPU helps to accelerate the inference part only
#SBATCH --cpus-per-task=8 # a MAXIMUM of 8 core, Alpafold has no benefit to use more
#SBATCH --mem=20G # adjust this according to the memory you need
# Load modules dependencies
module load gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8
DOWNLOAD_DIR=$SCRATCH/alphafold/data # set the appropriate path to your downloaded data
INPUT_DIR=$SCRATCH/alphafold/input # set the appropriate path to your supporting data
OUTPUT_DIR=${SCRATCH}/alphafold/output # set the appropriate path to your supporting data
# Generate your virtual environment in $SLURM_TMPDIR
virtualenv --no-download ${SLURM_TMPDIR}/env
source ${SLURM_TMPDIR}/env/bin/activate
# Install alphafold and its dependencies
pip install --no-index --upgrade pip
pip install --no-index alphafold==2.2.2
# Edit with the proper arguments, run your commands
# run_alphafold.py --help
run_alphafold.py \
--data_dir=${DOWNLOAD_DIR} \
--fasta_paths=${INPUT_DIR}/YourSequence.fasta,${INPUT_DIR}/AnotherSequence.fasta \
--bfd_database_path=${DOWNLOAD_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb70_database_path=${DOWNLOAD_DIR}/pdb70/pdb70 \
--template_mmcif_dir=${DOWNLOAD_DIR}/pdb_mmcif/mmcif_files \
--uniclust30_database_path=${DOWNLOAD_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--uniref90_database_path=${DOWNLOAD_DIR}/uniref90/uniref90.fasta \
--hhblits_binary_path=${EBROOTHHMINSUITE}/bin/hhblits \
--hhsearch_binary_path=${EBROOTHHMINSUITE}/bin/hhsearch \
--jackhmmer_binary_path=${EBROOTHMMER}/bin/jackhmmer \
--kalign_binary_path=${EBROOTKALIGN}/bin/kalign \
--mgnify_database_path=${DOWNLOAD_DIR}/mgnify/mgy_clusters_2018_12.fa \
--output_dir=${OUTPUT_DIR} \
--obsolete_pdbs_path=${DOWNLOAD_DIR}/pdb_mmcif/obsolete.dat \
--max_template_date=2020-05-14 \
--model_preset=monomer_casp14 \
--use_gpu_relax=True
Then submit the job to the scheduler:
(alphafold_env) [name@server ~] sbatch --job-name alphafold-X alphafold-gpu.sh
Using Singularity
AlphaFold documentation explains how to run the software using Docker. We do not provide Docker, but instead provide Singularity. It is recommended to use a virtual environment and a Python wheel available from the Compute Canada "wheelhouse".
First read our Singularity documentation as there are particularities of each cluster that one must take into account. Then build a Singularity container:
[name@server ~]$ cd $SCRATCH
[name@server ~]$ module load singularity
[name@server ~]$ singularity build alphafold.sif docker://uvarc/alphafold:2.2.0
Running AlphaFold within Singularity
AlphaFold has at most 8 cpus hardcoded since it does not benefit from using more than 8.
Create a directory alphafold_output to hold the output files:
[name@server ~]$ mkdir $SCRATCH/alphafold_output
Then edit the job submission script:
#!/bin/bash
#SBATCH --job-name alphafold-run
#SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=08:00:00 # adjust this to match the walltime of your job
#SBATCH --gres=gpu:1 # a GPU helps to accelerate the inference part only
#SBATCH --cpus-per-task=8 # a MAXIMUM of 8 core, Alpafold has no benefit to use more
#SBATCH --mem=20G # adjust this according to the memory you need
module load singularity
export PYTHONNOUSERSITE=True
ALPHAFOLD_DATA_PATH=/path/to/alphafold/databases
ALPHAFOLD_MODELS=/path/to/alphafold/databases/params
# Run the command
singularity run --nv \
-B $ALPHAFOLD_DATA_PATH:/data \
-B $ALPHAFOLD_MODELS \
-B .:/etc \
--pwd /app/alphafold alphaFold.sif \
--fasta_paths=/path/to/input.fasta \
--uniref90_database_path=/data/uniref90/uniref90.fasta \
--data_dir=/data \
--mgnify_database_path=/data/mgnify/mgy_clusters.fa \
--bfd_database_path=/data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniclust30_database_path=/data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--pdb70_database_path=/data/pdb70/pdb70 \
--template_mmcif_dir=/data/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/data/pdb_mmcif/obsolete.dat \
--max_template_date=2020-05-14 \
--output_dir=alphafold_output \
--model_names='model_1' \
--preset=casp14 \
--use_gpu_relax=True
AlphaFold launches multithreaded analysis using up to 8 CPUs before running model inference on the GPU.
Memory requirements will vary with different size proteins.
Bind-mount the current working directory to /etc inside the container for the cache file ld.so.cache [-B .:/etc]. The --nv flag is used to enable the GPU support. Submit this job script ('alpharun_jobscript.sh') using the Slurm sbatch command.
[name@server ~]$ sbatch alpharun_jobscript.sh
On successful completion, the output directory should have the following files:
[name@server ~]$ tree alphafold_output/input
alphafold_output
└── input
├── features.pkl
├── msas
│ ├── bfd_uniclust_hits.a3m
│ ├── mgnify_hits.sto
│ └── uniref90_hits.sto
├── ranked_0.pdb
├── ranking_debug.json
├── relaxed_model_1.pdb
├── result_model_1.pkl
├── timings.json
└── unrelaxed_model_1.pdb
2 directories, 10 files