MetaPhlAn: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
Line 10: Line 10:
{{Command|prompt=(ENV) [name@server ~]|pip install --no-index --upgrade pip}}
{{Command|prompt=(ENV) [name@server ~]|pip install --no-index --upgrade pip}}
and finally install the wheel,
and finally install the wheel,
{{Command|prompt=(ENV) [name@server ~]|pip install --no-index metaphlan}}
{{Command|prompt=(ENV) [name@server ~]|pip install metaphlan}}


==Initialization==
==Initialization==

Revision as of 13:52, 27 October 2022


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.




MetaPhlAn is a "computational tool for profiling the composition of microbial communities (Bacteria, Archaea and Eukaryotes) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With StrainPhlAn, it is possible to perform accurate strain-level microbial profiling", according to its GitHub repository. While the software stack on our clusters does contain modules for a couple of older versions (2.2.0 and 2.8) of this software, we now expect users to install more recent versions using a Python virtual environment. Wheels are available in our wheelhouse for these more recent versions of the MetaPhlAn software: 3.0.0a1, 3.0.7 and 4.0.2. You should begin by loading certain modules needed by the Python wheel,

Question.png
[name@server ~]$ module load gcc blast samtools bedtools bowtie2 python/3.9

after which you can create the virtual environment

Question.png
[name@server ~]$ virtualenv --no-download --clear $HOME/ENV

You should then enter the virtual environment,

Question.png
[name@server ~]$ source $HOME/ENV/bin/activate

update pip if necessary,

Question.png
(ENV) [name@server ~] pip install --no-index --upgrade pip

and finally install the wheel,

Question.png
(ENV) [name@server ~] pip install metaphlan

Initialization

In order to be used correctly, MetaPhlAn needs to download certain databases from a remote server and then compute indices derived from the components of these databases. On those clusters which do not permit Internet access from the compute nodes, these databases will have to be downloaded using a login node using a tool such as wget,

Question.png
(ENV) [name@server ~] parallel wget ::: http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103.tar http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103_marker_info.txt.bz2 http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt.bz2

You can then untar the database file and compute the indices using a job, so as not to put an undue computational burden on the shared login node which you are using. A sample script is the following,

File : job.sh

#!/bin/bash
#SBATCH --account=def-someuser
#SBATCH --time=01:00:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=10G

module load gcc blast samtools bedtools bowtie2 python/3.9
cd $HOME
pbunzip2 -p4 mpa_vJan21_CHOCOPhlAnSGB_202103_marker_info.txt.bz2
source ENV/bin/activate
metaphlan -nproc 4 --install --index mpa_vJan21_CHOCOPhlAnSGB_202103 --bowtie2db $PWD