AlphaFold: Difference between revisions

From Alliance Doc

@@ Line 1: / Line 1: @@
+<languages />
+[[Category:Software]]
 <translate>
+<!--T:1-->
 [https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology AlphaFold]
-is a machine-learning model for the prediction of protein folding.
+is a machine learning model for the prediction of protein folding.
+<!--T:2-->
 This page discusses how to use AlphaFold v2.0, the version that was entered in CASP14 and published in Nature.
+<!--T:3-->
 Source code and documentation for AlphaFold can be found at their [https://github.com/deepmind/alphafold GitHub page].
-Any publication that discloses findings arising from using this source code or the model parameters should [https://github.com/deepmind/alphafold#citing-this-work cite] the [https://doi.org/10.1038/s41586-021-03819-2 AlphaFold paper].
+Any publication that discloses findings arising from use of this source code or the model parameters should [https://github.com/deepmind/alphafold#citing-this-work cite] the [https://doi.org/10.1038/s41586-021-03819-2 AlphaFold paper].
-== Using Python wheel ==
+== Available versions == <!--T:5-->
+AlphaFold is available on our clusters as prebuilt Python packages (wheels). You can list available versions with <code>avail_wheels</code>.
-=== Available wheels ===
-You can list available wheels using the <tt>avail_wheels</tt> command:
 {{Command
-|avail_wheels alphafold
+|avail_wheels alphafold --all-versions
 |result=
 name       version    python    arch
 ---------  ---------  --------  -------
+alphafold  2.3.1      py3       generic
+alphafold  2.3.0      py3       generic
+alphafold  2.2.4      py3       generic
+alphafold  2.2.3      py3       generic
 alphafold  2.2.2      py3       generic
+alphafold  2.2.1      py3       generic
+alphafold  2.1.1      py3       generic
+alphafold  2.0.0      py3       generic
 }}
-=== Installing AlphaFold in a Python virtual environment ===
+== Installing AlphaFold in a Python virtual environment == <!--T:6-->
-. Load AlphaFold dependencies:
+<!--T:7-->
-{{Command|module load gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8
+. Load AlphaFold dependencies.
+{{Command|module load StdEnv/2020 gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8
 }}
-Only python 3.7 and 3.8 are currently supported.
+As of July 2022, only Python 3.7 and 3.8 are supported.
-. Create a Python virtual environment and activate it:
+<!--T:8-->
-{{Commands2
+. Create and activate a Python virtual environment.
+{{Commands
 |virtualenv --no-download ~/alphafold_env
 |source ~/alphafold_env/bin/activate
 }}
-. Install a specific version of AlphaFold and its python dependencies:
+<!--T:9-->
-{{Commands2
+. Install a specific version of AlphaFold and its Python dependencies.
+{{Commands
 |prompt=(alphafold_env) [name@server ~]
 |pip install --no-index --upgrade pip
-|pip install --no-index alphafold{{=}}{{=}}2.2.2
+|pip install --no-index alphafold{{=}}{{=}}X.Y.Z
 }}
+where <code>X.Y.Z</code> is the exact desired version, for instance <code>2.2.4</code>.
+You can omit to specify the version in order to install the latest one available from the wheelhouse.
-. Validate it
+<!--T:10-->
+. Validate it.
 {{Command
 |prompt=(alphafold_env) [name@server ~]
@@ Line 47: / Line 64: @@
 }}
-=== Databases ===
+<!--T:45-->
-Note that AlphaFold requires a set of datasets/databases to be downloaded into the <tt>$SCRATCH</tt>.
+. Freeze the environment and requirements set.
+{{Command
+|prompt=(alphafold_env) [name@server ~]
+|pip freeze > ~/alphafold-requirements.txt
+}}
-'''Important:''' The database must live in the <tt>$SCRATCH</tt>
+== Databases == <!--T:11-->
+Note that AlphaFold requires a set of databases.
+<!--T:65-->
+The databases are available in
+<code>/cvmfs/bio.data.computecanada.ca/content/databases/Core/alphafold2_dbs/</code>.
+<!--T:63-->
+AlphaFold databases on CVMFS undergo yearly updates. In January 2024, the database was updated and is accessible in folder <code>2024_01</code>.
+{{Command
+|prompt=(alphafold_env) [name@server ~]
+|export DOWNLOAD_DIR{{=}}/cvmfs/bio.data.computecanada.ca/content/databases/Core/alphafold2_dbs/2024_01/
+}}
+<!--T:66-->
+You can also choose to download the databases locally into your <code>$SCRATCH</code> directory.
+<!--T:12-->
+<b>Important:</b> The databases must live in the <code>$SCRATCH</code> directory.
+<!--T:13-->
 <tabs>
 <tab name="General">
-. From a login node, create the data folder:
+. From a DTN or login node, create the data folder.
-{{Commands2
+{{Commands
 |prompt=(alphafold_env) [name@server ~]
 |export DOWNLOAD_DIR{{=}}$SCRATCH/alphafold/data
@@ Line 61: / Line 101: @@
 }}
-. With your virtual environment activated, you can download the data:
+<!--T:14-->
+. With your modules loaded and virtual environment activated, you can download the data.
 {{Command
 |prompt=(alphafold_env) [name@server ~]
@@ Line 67: / Line 108: @@
 }}
-Note that this step '''cannot''' be done from compute nodes but rather from a login node. Since the download might take a while we suggest to start the download in a [https://linuxize.com/post/how-to-use-linux-screen/ screen] or [https://docs.computecanada.ca/wiki/Tmux Tmux] session.
+<!--T:15-->
+Note that this step <b>cannot</b> be done from a compute node. It should be done on a data transfer node (DTN) on clusters that have them (see [[Transferring data]]). On clusters that have no DTN, use a login node instead. Since the download can take up to a full day, we suggest using a [[Prolonging_terminal_sessions#Terminal_multiplexers|terminal multiplexer]]. You may encounter a <code>Client_loop: send disconnect: Broken pipe</code> error message. See [[AlphaFold#Broken pipe error message|Troubleshooting]] below.
+<!--T:67-->
 </tab>
+<!--T:16-->
 <tab name="Graham only">
-. Set <tt>DOWNLOAD_DIR</tt>:
+. Set <code>DOWNLOAD_DIR</code>.
 {{Command
 |prompt=(alphafold_env) [name@server ~]
 |export DOWNLOAD_DIR{{=}}/datashare/alphafold
 }}
+<!--T:62-->
 </tab>
 </tabs>
-Afterwards, the structure of your data should be similar to:
+<!--T:47-->
+Afterwards, the structure of your data should be similar to
+<tabs>
+<tab name=2.3>
+{{Command
+|prompt=(alphafold_env) [name@server ~]
+|tree -d $DOWNLOAD_DIR
+|result=
+$DOWNLOAD_DIR/                             # ~ 2.6 TB (total)
+    bfd/                                   # ~ 1.8 TB
+        # 6 files
+    mgnify/                                # ~ 120 GB
+        mgy_clusters.fa
+    params/                                # ~ 5.3 GB
+        # LICENSE
+        # 15 models
+        # 16 files (total)
+    pdb70/                                 # ~ 56 GB
+        # 9 files
+    pdb_mmcif/                             # ~ 246 GB
+        mmcif_files/
+            # 202,764 files
+        obsolete.dat
+    pdb_seqres/                            # ~ 237 MB
+        pdb_seqres.txt
+    uniprot/                               # ~ 111 GB
+        uniprot.fasta
+    uniref30/                              # ~ 206 GB
+        # 7 files
+    uniref90/                              # ~ 73 GB
+        uniref90.fasta
+}}
+</tab>
+<!--T:17-->
+<tab name=2.2>
 {{Command
 |prompt=(alphafold_env) [name@server ~]
@@ Line 106: / Line 189: @@
          uniref90.fasta
 }}
+</tab>
+</tabs>
-=== Running AlphaFold ===
+== Running AlphaFold == <!--T:18-->
 {{Warning
 |title=Performance
-|content=AlphaFold has at most 8 cpus hardcoded since it does not benefit from using more than 8.
+|content=You can request at most 8 CPU cores when running AlphaFold because it is hardcoded to not use more and does not benefit from using more.
 }}
-Edit to your needs the following submission script:
+<!--T:19-->
+Edit one of following submission scripts according to your needs.
 <tabs>
-<tab name="CPU">
+<tab name="2.3 on CPU">
 {{File
-|name=alphafold-cpu.sh
+|name=alphafold-2.3-cpu.sh
 |lang="bash"
 |contents=
 #!/bin/bash
+<!--T:48-->
 #SBATCH --job-name=alphafold_run
 #SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
 #SBATCH --time=08:00:00           # adjust this to match the walltime of your job
-#SBATCH --cpus-per-task=8         # a MAXIMUM of 8 core, Alpafold has no benefit to use more
+#SBATCH --cpus-per-task=8         # a MAXIMUM of 8 core, AlphaFold has no benefit to use more
 #SBATCH --mem=20G                 # adjust this according to the memory you need
-# Load modules dependencies
+<!--T:49-->
-module load gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8
+# Load modules dependencies.
+module load StdEnv/2020 gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8
+<!--T:50-->
 DOWNLOAD_DIR=$SCRATCH/alphafold/data   # set the appropriate path to your downloaded data
-INPUT_DIR=$SCRATCH/alphafold/input     # set the appropriate path to your supporting data
+INPUT_DIR=$SCRATCH/alphafold/input     # set the appropriate path to your input data
-OUTPUT_DIR=${SCRATCH}/alphafold/output # set the appropriate path to your supporting data
+OUTPUT_DIR=${SCRATCH}/alphafold/output # set the appropriate path to your output data
-# Generate your virtual environment in $SLURM_TMPDIR
+<!--T:51-->
+# Generate your virtual environment in $SLURM_TMPDIR.
 virtualenv --no-download ${SLURM_TMPDIR}/env
 source ${SLURM_TMPDIR}/env/bin/activate
-# Install alphafold and its dependencies
+<!--T:52-->
+# Install AlphaFold and its dependencies.
 pip install --no-index --upgrade pip
-pip install --no-index alphafold==2.2.2
+pip install --no-index --requirement ~/alphafold-requirements.txt
-# Edit with the proper arguments, run your commands
+<!--T:53-->
+# Edit with the proper arguments and run your commands.
 # run_alphafold.py --help
 run_alphafold.py \
+   --fasta_paths=${INPUT_DIR}/YourSequence.fasta,${INPUT_DIR}/AnotherSequence.fasta \
+   --output_dir=${OUTPUT_DIR} \
     --data_dir=${DOWNLOAD_DIR} \
-    --fasta_paths=${INPUT_DIR}/YourSequence.fasta,${INPUT_DIR}/AnotherSequence.fasta \
+    --db_preset=full_dbs \
+   --model_preset=multimer \
     --bfd_database_path=${DOWNLOAD_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
+   --mgnify_database_path=${DOWNLOAD_DIR}/mgnify/mgy_clusters_2022_05.fa \
     --pdb70_database_path=${DOWNLOAD_DIR}/pdb70/pdb70 \
     --template_mmcif_dir=${DOWNLOAD_DIR}/pdb_mmcif/mmcif_files \
-    --uniclust30_database_path=${DOWNLOAD_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08  \
+    --obsolete_pdbs_path=${DOWNLOAD_DIR}/pdb_mmcif/obsolete.dat \
-    --uniref90_database_path=${DOWNLOAD_DIR}/uniref90/uniref90.fasta  \
+   --pdb_seqres_database_path=${DOWNLOAD_DIR}/pdb_seqres/pdb_seqres.txt \
+   --uniprot_database_path=${DOWNLOAD_DIR}/uniprot/uniprot.fasta \
+   --uniref30_database_path=${DOWNLOAD_DIR}/uniref30/UniRef30_2021_03 \
+    --uniref90_database_path=${DOWNLOAD_DIR}/uniref90/uniref90.fasta \
     --hhblits_binary_path=${EBROOTHHMINSUITE}/bin/hhblits \
     --hhsearch_binary_path=${EBROOTHHMINSUITE}/bin/hhsearch \
     --jackhmmer_binary_path=${EBROOTHMMER}/bin/jackhmmer \
     --kalign_binary_path=${EBROOTKALIGN}/bin/kalign \
-   --mgnify_database_path=${DOWNLOAD_DIR}/mgnify/mgy_clusters_2018_12.fa \
+    --max_template_date=2022-01-01 \
-   --output_dir=${OUTPUT_DIR} \
+    --use_gpu_relax=False
-   --obsolete_pdbs_path=${DOWNLOAD_DIR}/pdb_mmcif/obsolete.dat \
-    --max_template_date=2020-05-14 \
-    --model_preset=monomer_casp14
 }}
 </tab>
-<tab name="GPU">
+<!--T:54-->
+<tab name="2.3 on GPU">
 {{File
-|name=alphafold-gpu.sh
+|name=alphafold-2.3-gpu.sh
 |lang="bash"
 |contents=
 #!/bin/bash
+<!--T:55-->
 #SBATCH --job-name=alphafold_run
 #SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
 #SBATCH --time=08:00:00           # adjust this to match the walltime of your job
+#SBATCH --cpus-per-task=8         # a MAXIMUM of 8 core, AlphaFold has no benefit to use more
 #SBATCH --gres=gpu:1              # a GPU helps to accelerate the inference part only
-#SBATCH --cpus-per-task=8         # a MAXIMUM of 8 core, Alpafold has no benefit to use more
 #SBATCH --mem=20G                 # adjust this according to the memory you need
-# Load modules dependencies
+<!--T:56-->
-module load gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8
+# Load modules dependencies.
+module load StdEnv/2020 gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8
+<!--T:57-->
 DOWNLOAD_DIR=$SCRATCH/alphafold/data   # set the appropriate path to your downloaded data
-INPUT_DIR=$SCRATCH/alphafold/input     # set the appropriate path to your supporting data
+INPUT_DIR=$SCRATCH/alphafold/input     # set the appropriate path to your input data
-OUTPUT_DIR=${SCRATCH}/alphafold/output # set the appropriate path to your supporting data
+OUTPUT_DIR=${SCRATCH}/alphafold/output # set the appropriate path to your output data
-# Generate your virtual environment in $SLURM_TMPDIR
+<!--T:58-->
+# Generate your virtual environment in $SLURM_TMPDIR.
 virtualenv --no-download ${SLURM_TMPDIR}/env
 source ${SLURM_TMPDIR}/env/bin/activate
-# Install alphafold and its dependencies
+<!--T:59-->
+# Install AlphaFold and its dependencies.
 pip install --no-index --upgrade pip
-pip install --no-index alphafold==2.2.2
+pip install --no-index --requirement ~/alphafold-requirements.txt
-# Edit with the proper arguments, run your commands
+<!--T:60-->
+# Edit with the proper arguments and run your commands.
 # run_alphafold.py --help
 run_alphafold.py \
+   --fasta_paths=${INPUT_DIR}/YourSequence.fasta,${INPUT_DIR}/AnotherSequence.fasta \
+   --output_dir=${OUTPUT_DIR} \
     --data_dir=${DOWNLOAD_DIR} \
-    --fasta_paths=${INPUT_DIR}/YourSequence.fasta,${INPUT_DIR}/AnotherSequence.fasta \
+    --db_preset=full_dbs \
+   --model_preset=multimer \
     --bfd_database_path=${DOWNLOAD_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
+   --mgnify_database_path=${DOWNLOAD_DIR}/mgnify/mgy_clusters_2022_05.fa \
     --pdb70_database_path=${DOWNLOAD_DIR}/pdb70/pdb70 \
     --template_mmcif_dir=${DOWNLOAD_DIR}/pdb_mmcif/mmcif_files \
-    --uniclust30_database_path=${DOWNLOAD_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08  \
+    --obsolete_pdbs_path=${DOWNLOAD_DIR}/pdb_mmcif/obsolete.dat \
-    --uniref90_database_path=${DOWNLOAD_DIR}/uniref90/uniref90.fasta  \
+   --pdb_seqres_database_path=${DOWNLOAD_DIR}/pdb_seqres/pdb_seqres.txt \
+   --uniprot_database_path=${DOWNLOAD_DIR}/uniprot/uniprot.fasta \
+   --uniref30_database_path=${DOWNLOAD_DIR}/uniref30/UniRef30_2021_03 \
+    --uniref90_database_path=${DOWNLOAD_DIR}/uniref90/uniref90.fasta \
     --hhblits_binary_path=${EBROOTHHMINSUITE}/bin/hhblits \
     --hhsearch_binary_path=${EBROOTHHMINSUITE}/bin/hhsearch \
     --jackhmmer_binary_path=${EBROOTHMMER}/bin/jackhmmer \
     --kalign_binary_path=${EBROOTKALIGN}/bin/kalign \
-   --mgnify_database_path=${DOWNLOAD_DIR}/mgnify/mgy_clusters_2018_12.fa \
+    --max_template_date=2022-01-01 \
-   --output_dir=${OUTPUT_DIR} \
-   --obsolete_pdbs_path=${DOWNLOAD_DIR}/pdb_mmcif/obsolete.dat \
-    --max_template_date=2020-05-14 \
-   --model_preset=monomer_casp14 \
     --use_gpu_relax=True
 }}
 </tab>
-</tabs>
-Then submit the job to the scheduler:
+<!--T:61-->
-{{Command
+<tab name="2.2 on CPU">
-|prompt=(alphafold_env) [name@server ~]
+{{File
-|sbatch --job-name alphafold-X alphafold-gpu.sh
+|name=alphafold-cpu.sh
-}}
+|lang="bash"
+|contents=
+#!/bin/bash
+<!--T:20-->
+#SBATCH --job-name=alphafold_run
+#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
+#SBATCH --time=08:00:00           # adjust this to match the walltime of your job
+#SBATCH --cpus-per-task=8         # a MAXIMUM of 8 core, AlphaFold has no benefit to use more
+#SBATCH --mem=20G                 # adjust this according to the memory you need
+<!--T:21-->
+# Load modules dependencies.
+module load StdEnv/2020 gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8
+<!--T:22-->
+DOWNLOAD_DIR=$SCRATCH/alphafold/data   # set the appropriate path to your downloaded data
+INPUT_DIR=$SCRATCH/alphafold/input     # set the appropriate path to your input data
+OUTPUT_DIR=${SCRATCH}/alphafold/output # set the appropriate path to your output data
-== Using Singularity ==
+<!--T:23-->
-AlphaFold documentation explains how to run the software using Docker. We do not provide Docker, but instead provide [[Singularity]]. It is recommended to use a virtual environment and a Python wheel available from the Compute Canada "wheelhouse".
+# Generate your virtual environment in $SLURM_TMPDIR.
+virtualenv --no-download ${SLURM_TMPDIR}/env
+source ${SLURM_TMPDIR}/env/bin/activate
-First read our [[Singularity]] documentation as there are particularities of each cluster that one must take into account. Then [[Singularity#Creating_images_on_Compute_Canada_clusters| build a Singularity container]]:
+<!--T:24-->
-{{Commands2
+# Install AlphaFold and its dependencies.
-|cd $SCRATCH
+pip install --no-index --upgrade pip
-|module load singularity
+pip install --no-index --requirement ~/alphafold-requirements.txt
-|singularity build alphafold.sif docker://uvarc/alphafold:2.2.0
-}}
-=== Running AlphaFold within Singularity ===
+<!--T:25-->
-{{Warning
+# Edit with the proper arguments and run your commands.
-|title=Performance
+# Note that the `--uniclust30_database_path` option below was renamed to
-|content=AlphaFold has at most 8 cpus hardcoded since it does not benefit from using more than 8.
+# `--uniref30_database_path` in 2.3.
-}}
+# run_alphafold.py --help
-Create a directory <tt>alphafold_output</tt> to hold the output files:
+run_alphafold.py \
-{{Command
+   --fasta_paths=${INPUT_DIR}/YourSequence.fasta,${INPUT_DIR}/AnotherSequence.fasta \
-|mkdir $SCRATCH/alphafold_output
+   --output_dir=${OUTPUT_DIR} \
+   --data_dir=${DOWNLOAD_DIR} \
+   --model_preset=monomer_casp14 \
+   --bfd_database_path=${DOWNLOAD_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
+   --mgnify_database_path=${DOWNLOAD_DIR}/mgnify/mgy_clusters_2018_12.fa \
+   --pdb70_database_path=${DOWNLOAD_DIR}/pdb70/pdb70 \
+   --template_mmcif_dir=${DOWNLOAD_DIR}/pdb_mmcif/mmcif_files \
+   --obsolete_pdbs_path=${DOWNLOAD_DIR}/pdb_mmcif/obsolete.dat \
+   --uniclust30_database_path=${DOWNLOAD_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08  \
+   --uniref90_database_path=${DOWNLOAD_DIR}/uniref90/uniref90.fasta  \
+   --hhblits_binary_path=${EBROOTHHMINSUITE}/bin/hhblits \
+   --hhsearch_binary_path=${EBROOTHHMINSUITE}/bin/hhsearch \
+   --jackhmmer_binary_path=${EBROOTHMMER}/bin/jackhmmer \
+   --kalign_binary_path=${EBROOTKALIGN}/bin/kalign \
+   --max_template_date=2020-05-14 \
+   --use_gpu_relax=False
 }}
+</tab>
-Then edit the job submission script:
+<!--T:26-->
+<tab name="2.2 on GPU">
 {{File
-|name=alphafold-singularity.sh
+|name=alphafold-gpu.sh
 |lang="bash"
 |contents=
 #!/bin/bash
-#SBATCH --job-name alphafold-run
+<!--T:27-->
+#SBATCH --job-name=alphafold_run
 #SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
 #SBATCH --time=08:00:00           # adjust this to match the walltime of your job
 #SBATCH --gres=gpu:1              # a GPU helps to accelerate the inference part only
-#SBATCH --cpus-per-task=8         # a MAXIMUM of 8 core, Alpafold has no benefit to use more
+#SBATCH --cpus-per-task=8         # a MAXIMUM of 8 core, AlphaFold has no benefit to use more
 #SBATCH --mem=20G                 # adjust this according to the memory you need
-module load singularity
+<!--T:28-->
+# Load modules dependencies.
+module load StdEnv/2020 gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8
-export PYTHONNOUSERSITE=True
+<!--T:29-->
+DOWNLOAD_DIR=$SCRATCH/alphafold/data   # set the appropriate path to your downloaded data
+INPUT_DIR=$SCRATCH/alphafold/input     # set the appropriate path to your input data
+OUTPUT_DIR=${SCRATCH}/alphafold/output # set the appropriate path to your output data
-ALPHAFOLD_DATA_PATH=/path/to/alphafold/databases
+<!--T:30-->
-ALPHAFOLD_MODELS=/path/to/alphafold/databases/params
+# Generate your virtual environment in $SLURM_TMPDIR.
+virtualenv --no-download ${SLURM_TMPDIR}/env
+source ${SLURM_TMPDIR}/env/bin/activate
-# Run the command
+<!--T:31-->
-singularity run --nv \
+# Install AlphaFold  and its dependencies.
-    -B $ALPHAFOLD_DATA_PATH:/data \
+pip install --no-index --upgrade pip
-    -B $ALPHAFOLD_MODELS \
+pip install --no-index --requirement ~/alphafold-requirements.txt
-    -B .:/etc \
-    --pwd  /app/alphafold alphaFold.sif \
+<!--T:32-->
-    --fasta_paths=/path/to/input.fasta  \
+# Edit with the proper arguments and run your commands.
-    --uniref90_database_path=/data/uniref90/uniref90.fasta  \
+# Note that the `--uniclust30_database_path` option below was renamed to
-    --data_dir=/data \
+# `--uniref30_database_path` in 2.3.
-    --mgnify_database_path=/data/mgnify/mgy_clusters.fa   \
+# run_alphafold.py --help
-    --bfd_database_path=/data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
+run_alphafold.py \
-    --uniclust30_database_path=/data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
+   --fasta_paths=${INPUT_DIR}/YourSequence.fasta,${INPUT_DIR}/AnotherSequence.fasta \
-    --pdb70_database_path=/data/pdb70/pdb70  \
+   --output_dir=${OUTPUT_DIR} \
-    --template_mmcif_dir=/data/pdb_mmcif/mmcif_files  \
+   --data_dir=${DOWNLOAD_DIR} \
-    --obsolete_pdbs_path=/data/pdb_mmcif/obsolete.dat \
+   --model_preset=monomer_casp14 \
-    --max_template_date=2020-05-14   \
+   --bfd_database_path=${DOWNLOAD_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
-    --output_dir=alphafold_output  \
+   --mgnify_database_path=${DOWNLOAD_DIR}/mgnify/mgy_clusters_2018_12.fa \
-    --model_names='model_1' \
+   --pdb70_database_path=${DOWNLOAD_DIR}/pdb70/pdb70 \
-    --preset=casp14 \
+   --template_mmcif_dir=${DOWNLOAD_DIR}/pdb_mmcif/mmcif_files \
-    --use_gpu_relax=True
+   --obsolete_pdbs_path=${DOWNLOAD_DIR}/pdb_mmcif/obsolete.dat \
+   --uniclust30_database_path=${DOWNLOAD_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08  \
+   --uniref90_database_path=${DOWNLOAD_DIR}/uniref90/uniref90.fasta  \
+   --hhblits_binary_path=${EBROOTHHMINSUITE}/bin/hhblits \
+   --hhsearch_binary_path=${EBROOTHHMINSUITE}/bin/hhsearch \
+   --jackhmmer_binary_path=${EBROOTHMMER}/bin/jackhmmer \
+   --kalign_binary_path=${EBROOTKALIGN}/bin/kalign \
+   --max_template_date=2020-05-14 \
+   --use_gpu_relax=True
 }}
-AlphaFold launches multithreaded analysis using up to 8 CPUs before running model inference on the GPU.
+</tab>
-Memory requirements will vary with different size proteins.
+</tabs>
-Bind-mount the current working directory to <tt>/etc</tt> inside the container for the cache file ld.so.cache [-B .:/etc]. The <tt>--nv</tt> flag is used to enable the GPU support.
+<!--T:33-->
-Submit this job script ('alpharun_jobscript.sh') using the Slurm sbatch command.
+Then, submit the job to the scheduler.
 {{Command
-|sbatch alpharun_jobscript.sh
+|prompt=(alphafold_env) [name@server ~]
+|sbatch --job-name alphafold-X alphafold-gpu.sh
 }}
-On successful completion, the output directory should have the following files:
+== Troubleshooting == <!--T:68-->
-{{Command
+=== Broken pipe error message ===
-|tree alphafold_output/input
+When downloading the database, you may encounter a <code>Client_loop: send disconnect: Broken pipe</code> error message. It is hard to find the exact cause for this error message. It could be as simple as an unusually high number of users working on the login node, leaving less space for you to upload data.
-|result=
- alphafold_output
+<!--T:69-->
- └── input
+*One solution is to use a [[Prolonging_terminal_sessions#Terminal_multiplexers|terminal multiplexer]]. Note that you could still encounter this error message but less are the chances.
-    ├── features.pkl
-    ├── msas
+<!--T:70-->
-    │   ├── bfd_uniclust_hits.a3m
+*A second solution is to use the database that is already present on the cluster. <code>/cvmfs/bio.data.computecanada.ca/content/databases/Core/alphafold2_dbs/2023_07/</code>.
-    │   ├── mgnify_hits.sto
-    │   └── uniref90_hits.sto
+<!--T:71-->
-    ├── ranked_0.pdb
+*Another option is to download the full database in sections. To have access to the different download scripts, after loading the module and activated your virtual environment, you simply enter <code>download_</code> in your terminal and tap twice on the <code>tab</code> keyboard key to visualize all the scripts that are available. You can manually download sections of the database by using the available script, as for instance <code>download_pdb.sh</code>.
-    ├── ranking_debug.json
-    ├── relaxed_model_1.pdb
-    ├── result_model_1.pkl
-    ├── timings.json
-    └── unrelaxed_model_1.pdb
-directories, 10 files
-}}
 </translate>

Latest revision as of 12:47, 1 May 2024

Other languages:

English
français

AlphaFold is a machine learning model for the prediction of protein folding.

This page discusses how to use AlphaFold v2.0, the version that was entered in CASP14 and published in Nature.

Source code and documentation for AlphaFold can be found at their GitHub page. Any publication that discloses findings arising from use of this source code or the model parameters should cite the AlphaFold paper.

1 Available versions
2 Installing AlphaFold in a Python virtual environment
3 Databases
4 Running AlphaFold
5 Troubleshooting
- 5.1 Broken pipe error message

Available versions[edit]

AlphaFold is available on our clusters as prebuilt Python packages (wheels). You can list available versions with avail_wheels.

[name@server ~]$ avail_wheels alphafold --all-versions
name       version    python    arch
---------  ---------  --------  -------
alphafold  2.3.1      py3       generic
alphafold  2.3.0      py3       generic
alphafold  2.2.4      py3       generic
alphafold  2.2.3      py3       generic
alphafold  2.2.2      py3       generic
alphafold  2.2.1      py3       generic
alphafold  2.1.1      py3       generic
alphafold  2.0.0      py3       generic

Installing AlphaFold in a Python virtual environment[edit]

1. Load AlphaFold dependencies.

[name@server ~]$ module load StdEnv/2020 gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8

As of July 2022, only Python 3.7 and 3.8 are supported.

2. Create and activate a Python virtual environment.

[name@server ~]$ virtualenv --no-download ~/alphafold_env
[name@server ~]$ source ~/alphafold_env/bin/activate

3. Install a specific version of AlphaFold and its Python dependencies.

(alphafold_env) [name@server ~] pip install --no-index --upgrade pip
(alphafold_env) [name@server ~] pip install --no-index alphafold==X.Y.Z

where X.Y.Z is the exact desired version, for instance 2.2.4. You can omit to specify the version in order to install the latest one available from the wheelhouse.

4. Validate it.

(alphafold_env) [name@server ~] run_alphafold.py --help

5. Freeze the environment and requirements set.

(alphafold_env) [name@server ~] pip freeze > ~/alphafold-requirements.txt

Databases[edit]

Note that AlphaFold requires a set of databases.

The databases are available in /cvmfs/bio.data.computecanada.ca/content/databases/Core/alphafold2_dbs/.

AlphaFold databases on CVMFS undergo yearly updates. In January 2024, the database was updated and is accessible in folder 2024_01.

(alphafold_env) [name@server ~] export DOWNLOAD_DIR=/cvmfs/bio.data.computecanada.ca/content/databases/Core/alphafold2_dbs/2024_01/

You can also choose to download the databases locally into your $SCRATCH directory.

Important: The databases must live in the $SCRATCH directory.

GeneralGraham only

1. From a DTN or login node, create the data folder.

(alphafold_env) [name@server ~] export DOWNLOAD_DIR=$SCRATCH/alphafold/data
(alphafold_env) [name@server ~] mkdir -p $DOWNLOAD_DIR

2. With your modules loaded and virtual environment activated, you can download the data.

(alphafold_env) [name@server ~] download_all_data.sh $DOWNLOAD_DIR

Note that this step cannot be done from a compute node. It should be done on a data transfer node (DTN) on clusters that have them (see Transferring data). On clusters that have no DTN, use a login node instead. Since the download can take up to a full day, we suggest using a terminal multiplexer. You may encounter a Client_loop: send disconnect: Broken pipe error message. See Troubleshooting below.

1. Set DOWNLOAD_DIR.

(alphafold_env) [name@server ~] export DOWNLOAD_DIR=/datashare/alphafold

Afterwards, the structure of your data should be similar to

2.32.2

(alphafold_env) [name@server ~] tree -d $DOWNLOAD_DIR
$DOWNLOAD_DIR/                             # ~ 2.6 TB (total)
    bfd/                                   # ~ 1.8 TB
        # 6 files
    mgnify/                                # ~ 120 GB
        mgy_clusters.fa
    params/                                # ~ 5.3 GB
        # LICENSE
        # 15 models
        # 16 files (total)
    pdb70/                                 # ~ 56 GB
        # 9 files
    pdb_mmcif/                             # ~ 246 GB
        mmcif_files/
            # 202,764 files
        obsolete.dat
    pdb_seqres/                            # ~ 237 MB
        pdb_seqres.txt
    uniprot/                               # ~ 111 GB
        uniprot.fasta
    uniref30/                              # ~ 206 GB
        # 7 files
    uniref90/                              # ~ 73 GB
        uniref90.fasta

(alphafold_env) [name@server ~] tree -d $DOWNLOAD_DIR
$DOWNLOAD_DIR/                             # Total: ~ 2.2 TB (download: 428 GB)
    bfd/                                   # ~ 1.8 TB (download: 271.6 GB)
        # 6 files.
    mgnify/                                # ~ 64 GB (download: 32.9 GB)
        mgy_clusters.fa
    params/                                # ~ 3.5 GB (download: 3.5 GB)
        # 5 CASP14 models,
        # 5 pTM models,
        # LICENSE,
        # = 11 files.
    pdb70/                                 # ~ 56 GB (download: 19.5 GB)
        # 9 files.
    pdb_mmcif/                             # ~ 206 GB (download: 46 GB)
        mmcif_files/
            # About 180,000 .cif files.
        obsolete.dat
    uniclust30/                            # ~ 87 GB (download: 24.9 GB)
        uniclust30_2018_08/
            # 13 files.
    uniref90/                              # ~ 59 GB (download: 29.7 GB)
        uniref90.fasta

Running AlphaFold[edit]

Performance

You can request at most 8 CPU cores when running AlphaFold because it is hardcoded to not use more and does not benefit from using more.

Edit one of following submission scripts according to your needs.

2.3 on CPU2.3 on GPU2.2 on CPU2.2 on GPU

File : alphafold-2.3-cpu.sh

#!/bin/bash

#SBATCH --job-name=alphafold_run
#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=08:00:00           # adjust this to match the walltime of your job
#SBATCH --cpus-per-task=8         # a MAXIMUM of 8 core, AlphaFold has no benefit to use more
#SBATCH --mem=20G                 # adjust this according to the memory you need

# Load modules dependencies.
module load StdEnv/2020 gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8

DOWNLOAD_DIR=$SCRATCH/alphafold/data   # set the appropriate path to your downloaded data
INPUT_DIR=$SCRATCH/alphafold/input     # set the appropriate path to your input data
OUTPUT_DIR=${SCRATCH}/alphafold/output # set the appropriate path to your output data

# Generate your virtual environment in $SLURM_TMPDIR.
virtualenv --no-download ${SLURM_TMPDIR}/env
source ${SLURM_TMPDIR}/env/bin/activate

# Install AlphaFold and its dependencies.
pip install --no-index --upgrade pip
pip install --no-index --requirement ~/alphafold-requirements.txt

# Edit with the proper arguments and run your commands.
# run_alphafold.py --help
run_alphafold.py \
   --fasta_paths=${INPUT_DIR}/YourSequence.fasta,${INPUT_DIR}/AnotherSequence.fasta \
   --output_dir=${OUTPUT_DIR} \
   --data_dir=${DOWNLOAD_DIR} \
   --db_preset=full_dbs \
   --model_preset=multimer \
   --bfd_database_path=${DOWNLOAD_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
   --mgnify_database_path=${DOWNLOAD_DIR}/mgnify/mgy_clusters_2022_05.fa \
   --pdb70_database_path=${DOWNLOAD_DIR}/pdb70/pdb70 \
   --template_mmcif_dir=${DOWNLOAD_DIR}/pdb_mmcif/mmcif_files \
   --obsolete_pdbs_path=${DOWNLOAD_DIR}/pdb_mmcif/obsolete.dat \
   --pdb_seqres_database_path=${DOWNLOAD_DIR}/pdb_seqres/pdb_seqres.txt \
   --uniprot_database_path=${DOWNLOAD_DIR}/uniprot/uniprot.fasta \
   --uniref30_database_path=${DOWNLOAD_DIR}/uniref30/UniRef30_2021_03 \
   --uniref90_database_path=${DOWNLOAD_DIR}/uniref90/uniref90.fasta \
   --hhblits_binary_path=${EBROOTHHMINSUITE}/bin/hhblits \
   --hhsearch_binary_path=${EBROOTHHMINSUITE}/bin/hhsearch \
   --jackhmmer_binary_path=${EBROOTHMMER}/bin/jackhmmer \
   --kalign_binary_path=${EBROOTKALIGN}/bin/kalign \
   --max_template_date=2022-01-01 \
   --use_gpu_relax=False

File : alphafold-2.3-gpu.sh

#!/bin/bash

#SBATCH --job-name=alphafold_run
#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=08:00:00           # adjust this to match the walltime of your job
#SBATCH --cpus-per-task=8         # a MAXIMUM of 8 core, AlphaFold has no benefit to use more
#SBATCH --gres=gpu:1              # a GPU helps to accelerate the inference part only
#SBATCH --mem=20G                 # adjust this according to the memory you need

# Load modules dependencies.
module load StdEnv/2020 gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8

DOWNLOAD_DIR=$SCRATCH/alphafold/data   # set the appropriate path to your downloaded data
INPUT_DIR=$SCRATCH/alphafold/input     # set the appropriate path to your input data
OUTPUT_DIR=${SCRATCH}/alphafold/output # set the appropriate path to your output data

# Generate your virtual environment in $SLURM_TMPDIR.
virtualenv --no-download ${SLURM_TMPDIR}/env
source ${SLURM_TMPDIR}/env/bin/activate

# Install AlphaFold and its dependencies.
pip install --no-index --upgrade pip
pip install --no-index --requirement ~/alphafold-requirements.txt

# Edit with the proper arguments and run your commands.
# run_alphafold.py --help
run_alphafold.py \
   --fasta_paths=${INPUT_DIR}/YourSequence.fasta,${INPUT_DIR}/AnotherSequence.fasta \
   --output_dir=${OUTPUT_DIR} \
   --data_dir=${DOWNLOAD_DIR} \
   --db_preset=full_dbs \
   --model_preset=multimer \
   --bfd_database_path=${DOWNLOAD_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
   --mgnify_database_path=${DOWNLOAD_DIR}/mgnify/mgy_clusters_2022_05.fa \
   --pdb70_database_path=${DOWNLOAD_DIR}/pdb70/pdb70 \
   --template_mmcif_dir=${DOWNLOAD_DIR}/pdb_mmcif/mmcif_files \
   --obsolete_pdbs_path=${DOWNLOAD_DIR}/pdb_mmcif/obsolete.dat \
   --pdb_seqres_database_path=${DOWNLOAD_DIR}/pdb_seqres/pdb_seqres.txt \
   --uniprot_database_path=${DOWNLOAD_DIR}/uniprot/uniprot.fasta \
   --uniref30_database_path=${DOWNLOAD_DIR}/uniref30/UniRef30_2021_03 \
   --uniref90_database_path=${DOWNLOAD_DIR}/uniref90/uniref90.fasta \
   --hhblits_binary_path=${EBROOTHHMINSUITE}/bin/hhblits \
   --hhsearch_binary_path=${EBROOTHHMINSUITE}/bin/hhsearch \
   --jackhmmer_binary_path=${EBROOTHMMER}/bin/jackhmmer \
   --kalign_binary_path=${EBROOTKALIGN}/bin/kalign \
   --max_template_date=2022-01-01 \
   --use_gpu_relax=True

File : alphafold-cpu.sh

#!/bin/bash

#SBATCH --job-name=alphafold_run
#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=08:00:00           # adjust this to match the walltime of your job
#SBATCH --cpus-per-task=8         # a MAXIMUM of 8 core, AlphaFold has no benefit to use more
#SBATCH --mem=20G                 # adjust this according to the memory you need

# Load modules dependencies.
module load StdEnv/2020 gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8

DOWNLOAD_DIR=$SCRATCH/alphafold/data   # set the appropriate path to your downloaded data
INPUT_DIR=$SCRATCH/alphafold/input     # set the appropriate path to your input data
OUTPUT_DIR=${SCRATCH}/alphafold/output # set the appropriate path to your output data

# Generate your virtual environment in $SLURM_TMPDIR.
virtualenv --no-download ${SLURM_TMPDIR}/env
source ${SLURM_TMPDIR}/env/bin/activate

# Install AlphaFold and its dependencies.
pip install --no-index --upgrade pip
pip install --no-index --requirement ~/alphafold-requirements.txt

# Edit with the proper arguments and run your commands.
# Note that the `--uniclust30_database_path` option below was renamed to
# `--uniref30_database_path` in 2.3.
# run_alphafold.py --help
run_alphafold.py \
   --fasta_paths=${INPUT_DIR}/YourSequence.fasta,${INPUT_DIR}/AnotherSequence.fasta \
   --output_dir=${OUTPUT_DIR} \
   --data_dir=${DOWNLOAD_DIR} \
   --model_preset=monomer_casp14 \
   --bfd_database_path=${DOWNLOAD_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
   --mgnify_database_path=${DOWNLOAD_DIR}/mgnify/mgy_clusters_2018_12.fa \
   --pdb70_database_path=${DOWNLOAD_DIR}/pdb70/pdb70 \
   --template_mmcif_dir=${DOWNLOAD_DIR}/pdb_mmcif/mmcif_files \
   --obsolete_pdbs_path=${DOWNLOAD_DIR}/pdb_mmcif/obsolete.dat \
   --uniclust30_database_path=${DOWNLOAD_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08  \
   --uniref90_database_path=${DOWNLOAD_DIR}/uniref90/uniref90.fasta  \
   --hhblits_binary_path=${EBROOTHHMINSUITE}/bin/hhblits \
   --hhsearch_binary_path=${EBROOTHHMINSUITE}/bin/hhsearch \
   --jackhmmer_binary_path=${EBROOTHMMER}/bin/jackhmmer \
   --kalign_binary_path=${EBROOTKALIGN}/bin/kalign \
   --max_template_date=2020-05-14 \
   --use_gpu_relax=False

File : alphafold-gpu.sh

#!/bin/bash

#SBATCH --job-name=alphafold_run
#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=08:00:00           # adjust this to match the walltime of your job
#SBATCH --gres=gpu:1              # a GPU helps to accelerate the inference part only
#SBATCH --cpus-per-task=8         # a MAXIMUM of 8 core, AlphaFold has no benefit to use more
#SBATCH --mem=20G                 # adjust this according to the memory you need

# Load modules dependencies.
module load StdEnv/2020 gcc/9.3.0 openmpi/4.0.3 cuda/11.4 cudnn/8.2.0 kalign/2.03 hmmer/3.2.1 openmm-alphafold/7.5.1 hh-suite/3.3.0 python/3.8

DOWNLOAD_DIR=$SCRATCH/alphafold/data   # set the appropriate path to your downloaded data
INPUT_DIR=$SCRATCH/alphafold/input     # set the appropriate path to your input data
OUTPUT_DIR=${SCRATCH}/alphafold/output # set the appropriate path to your output data

# Generate your virtual environment in $SLURM_TMPDIR.
virtualenv --no-download ${SLURM_TMPDIR}/env
source ${SLURM_TMPDIR}/env/bin/activate

# Install AlphaFold  and its dependencies.
pip install --no-index --upgrade pip
pip install --no-index --requirement ~/alphafold-requirements.txt

# Edit with the proper arguments and run your commands.
# Note that the `--uniclust30_database_path` option below was renamed to
# `--uniref30_database_path` in 2.3.
# run_alphafold.py --help
run_alphafold.py \
   --fasta_paths=${INPUT_DIR}/YourSequence.fasta,${INPUT_DIR}/AnotherSequence.fasta \
   --output_dir=${OUTPUT_DIR} \
   --data_dir=${DOWNLOAD_DIR} \
   --model_preset=monomer_casp14 \
   --bfd_database_path=${DOWNLOAD_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
   --mgnify_database_path=${DOWNLOAD_DIR}/mgnify/mgy_clusters_2018_12.fa \
   --pdb70_database_path=${DOWNLOAD_DIR}/pdb70/pdb70 \
   --template_mmcif_dir=${DOWNLOAD_DIR}/pdb_mmcif/mmcif_files \
   --obsolete_pdbs_path=${DOWNLOAD_DIR}/pdb_mmcif/obsolete.dat \
   --uniclust30_database_path=${DOWNLOAD_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08  \
   --uniref90_database_path=${DOWNLOAD_DIR}/uniref90/uniref90.fasta  \
   --hhblits_binary_path=${EBROOTHHMINSUITE}/bin/hhblits \
   --hhsearch_binary_path=${EBROOTHHMINSUITE}/bin/hhsearch \
   --jackhmmer_binary_path=${EBROOTHMMER}/bin/jackhmmer \
   --kalign_binary_path=${EBROOTKALIGN}/bin/kalign \
   --max_template_date=2020-05-14 \
   --use_gpu_relax=True

Then, submit the job to the scheduler.

(alphafold_env) [name@server ~] sbatch --job-name alphafold-X alphafold-gpu.sh

Troubleshooting[edit]

Broken pipe error message[edit]

When downloading the database, you may encounter a Client_loop: send disconnect: Broken pipe error message. It is hard to find the exact cause for this error message. It could be as simple as an unusually high number of users working on the login node, leaving less space for you to upload data.

One solution is to use a terminal multiplexer. Note that you could still encounter this error message but less are the chances.

A second solution is to use the database that is already present on the cluster. /cvmfs/bio.data.computecanada.ca/content/databases/Core/alphafold2_dbs/2023_07/.

Another option is to download the full database in sections. To have access to the different download scripts, after loading the module and activated your virtual environment, you simply enter download_ in your terminal and tap twice on the tab keyboard key to visualize all the scripts that are available. You can manually download sections of the database by using the available script, as for instance download_pdb.sh.

CernVM File System

Retrieved from "https://docs.alliancecan.ca/mediawiki/index.php?title=AlphaFold&oldid=152755"

Category:

Software

AlphaFold: Difference between revisions

Latest revision as of 12:47, 1 May 2024

Contents

Available versions[edit]

Installing AlphaFold in a Python virtual environment[edit]

Databases[edit]

Running AlphaFold[edit]

Troubleshooting[edit]

Broken pipe error message[edit]

Navigation menu

AlphaFold: Difference between revisions

Latest revision as of 12:47, 1 May 2024

Available versions[edit]

Installing AlphaFold in a Python virtual environment[edit]

Databases[edit]

Running AlphaFold[edit]

Troubleshooting[edit]

Broken pipe error message[edit]

Navigation menu

Search