MetaPhlAn: Difference between revisions
No edit summary |
|||
Line 13: | Line 13: | ||
==Initialization== | ==Initialization== | ||
In order to be used correctly, MetaPhlAn needs to download certain databases from a remote server and then compute indices derived from the components of these databases. On those clusters which do not permit Internet access from the compute nodes, these databases will have to be downloaded using a login node, | In order to be used correctly, MetaPhlAn needs to download certain databases from a remote server and then compute indices derived from the components of these databases. On those clusters which do not permit Internet access from the compute nodes, these databases will have to be downloaded using a login node using a tool such as wget, | ||
{{Command|prompt=(ENV) [name@server ~]| | {{Command|prompt=(ENV) [name@server ~]|wget http://cmprod1.cibio.unitn.it/biobakery3/metaphlan_databases/mpa_v31_CHOCOPhlAn_201901.tar}} | ||
You can then untar the database file and compute the indices using a job, so as not to put an undue computational burden on the shared login node which you are using. A sample script is the following, | |||
{{File | |||
|name=job.sh | |||
|lang="sh" | |||
|contents= | |||
#!/bin/bash | |||
#SBATCH --account=def-someuser | |||
#SBATCH --time=01:00:00 | |||
#SBATCH --cpus-per-task=4 | |||
#SBATCH --mem=10G | |||
module load gcc blast samtools bedtools bowtie2 python/3.9 | |||
cd $HOME | |||
source ENV/bin/activate | |||
metaphlan --install --index mpa_v31_CHOCOPhlAn_201901 --bowtie2db $PWD | |||
}} |
Revision as of 19:05, 26 October 2022
This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.
MetaPhlAn is a "computational tool for profiling the composition of microbial communities (Bacteria, Archaea and Eukaryotes) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With StrainPhlAn, it is possible to perform accurate strain-level microbial profiling", according to its GitHub repository. While the software stack on our clusters does contain modules for a couple of older versions (2.2.0 and 2.8) of this software, we now expect users to install more recent versions using a Python virtual environment. Wheels are available in our wheelhouse for these more recent versions of the MetaPhlAn software: 3.0.0a1, 3.0.7 and 4.0.2. You should begin by loading certain modules needed by the Python wheel,
[name@server ~]$ module load gcc blast samtools bedtools bowtie2 python/3.9
after which you can create the virtual environment
[name@server ~]$ virtualenv --no-download --clear $HOME/ENV
You should then enter the virtual environment,
[name@server ~]$ source $HOME/ENV/bin/activate
update pip if necessary,
(ENV) [name@server ~] pip install --no-index --upgrade pip
and finally install the wheel,
(ENV) [name@server ~] pip install --no-index metaphlan
Initialization
In order to be used correctly, MetaPhlAn needs to download certain databases from a remote server and then compute indices derived from the components of these databases. On those clusters which do not permit Internet access from the compute nodes, these databases will have to be downloaded using a login node using a tool such as wget,
(ENV) [name@server ~] wget http://cmprod1.cibio.unitn.it/biobakery3/metaphlan_databases/mpa_v31_CHOCOPhlAn_201901.tar
You can then untar the database file and compute the indices using a job, so as not to put an undue computational burden on the shared login node which you are using. A sample script is the following,
#!/bin/bash
#SBATCH --account=def-someuser
#SBATCH --time=01:00:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=10G
module load gcc blast samtools bedtools bowtie2 python/3.9
cd $HOME
source ENV/bin/activate
metaphlan --install --index mpa_v31_CHOCOPhlAn_201901 --bowtie2db $PWD