AlphaFold

From Alliance Doc
Jump to navigation Jump to search


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.



This package provides an implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP14 and published in Nature. For simplicity, we refer to this model as AlphaFold throughout the rest of this document.

Any publication that discloses findings arising from using this source code or the model parameters should cite the AlphaFold paper.

The source code of this package can be found in their GitHub page along with some documentation.

Usage in Compute Canada systems

As you might have seen from their documentation, they explain the usage via Docker. In Compute Canada we do not provide Docker as container, but singularity (see our documentation at https://docs.computecanada.ca/wiki/Singularity). However, we have created a wheel to use AlphaFold in a python environment.

AlphaFold in Python environment

Alpha fold has a number of non-python dependencies that need to be loaded ahead of time. For example, cuda, kalign, hmmer, and openmm. Luckily all these dependencies are available though our stack:

[name@cluster ~]$ module load gcc openmpi cuda/11.1 cudacore/.11.1.1 cudnn/8.2.0 kalign hmmer openmm python/3.7

Then you can proceed to create the python virtual environment and activate it by:

[name@cluster ~]$ virtualenv --no-download ~/my_env && source ~/my_env/bin/activate


Now you can install AlphaFold and its dependencies by:

(my_env)[name@cluster ~]$ pip install --no-index six==1.15 numpy==1.19.2 scipy==1.4.1 pdbfixer alphafold


Now AlphaFold is ready to be used.

Creating the virtual environment in the jobscript

As you probably have read in [Creating_and_using_a_virtual_environment](https://docs.computecanada.ca/wiki/Python#Creating_and_using_a_virtual_environment), you can also take advantage of the local installs on compute nodes:


File : my_alphafoldjob.sh

#!/bin/bash
#SBATCH --job-name=alphafold_run
#SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=0-03:00         # adjust this to match the walltime of your job
#SBATCH --nodes=1      
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1           # You need to request one GPU to be able to run AlphaFold properly
#SBATCH --cpus-per-task=1      # adjust this if you are using parallel commands
#SBATCH --mem=4000             # adjust this according to the memory requirement per node you need
#SBATCH --mail-user=you@youruniversity.ca # adjust this to match your email address
#SBATCH --mail-type=ALL

# Load your modules as before
module load gcc openmpi cuda/11.1 cudacore/.11.1.1 cudnn/8.2.0 kalign hmmer openmm python/3.7

# Generate your virtual environment in $SLURM_TMPDIR
virtualenv --no-download ${SLURM_TMPDIR}/my_env && source ${SLURM_TMPDIR}/my_env/bin/activate

# Install alphafold and dependencies
pip install --no-index six==1.15 numpy==1.19.2 scipy==1.4.1 pdbfixer alphafold

# Run your commands
python run_alphafold.py --help


Databases

Using singularity