Revision as of 21:59, 23 November 2023

Other languages:

English
français

BUSCO (pour Benchmarking Universal Single-Copy Orthologs) est une application qui permet d'évaluer la complétude de l'assemblage et de l'annotation de génomes.

Pour plus d'information, consultez le manuel de l'utilisateur.

2. Copiez le fichier de paramètres.

[name@server ~]$ cp -v $EBROOTBUSCO/config/config.ini.default $HOME/busco_config.ini

ou

[name@server ~]$ wget -O $HOME/busco_config.ini https://gitlab.com/ezlab/busco/raw/master/config/config.ini.default

To see the latest available version, run:

[name@server ~]$ avail_wheel busco

3. Modifier le fichier de paramètres. Les chemins pour les outils externes sont situés à la fin de ce fichier; nous en reproduisons le contenu ici :

5. Testez l'installation.

3. Install the wheel and its dependencies:

(busco_env) $ pip install biopython pandas busco --no-index

[name@server ~]$ export BUSCO_CONFIG_FILE=$HOME/busco_config.ini
[name@server ~]$ export AUGUSTUS_CONFIG_PATH=$HOME/augustus_config
[name@server ~]$ run_BUSCO.py --in $EBROOTBUSCO/sample_data/target.fa --out TEST --lineage_path $EBROOTBUSCO/sample_data/example --mode genome

Test

4. Téléchargez les données de test:

[name@server ~]$ wget https://gitlab.com/ezlab/busco/-/raw/master/test_data/bacteria/genome.fna
[name@server ~]$ wget https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2020-03-06.tar.gz

5. Lancez la commande

[name@server ~]$ busco --offline --in genome.fna --out TEST --lineage_dataset bacteria_odb10 --mode genome --cpu ${SLURM_CPUS_PER_TASK-1}

You can access the available datasets in your terminal by typing busco --list-datasets.

You have two options for datasets download:

Busco download command

6.1 Use busco download command (preferred method). Here is one example:

Type this command in your working directory to download one particular dataset:

[name@server ~]$ busco --download bacteria_odb10

It is also possible to do a bulk download by using the following arguments in place of the dataset name: "all", "prokaryota", "eukaryota", or "virus".

[name@server ~]$ busco --download virus

This will:

1. Create Busco directory hierarchy for datasets.

2. Download the appropriate datasets.

3. Decompress the file(s).

4. If you download multiple files, they will all be automatically added in the lineages directory.

Directories hierarchy will look as follows:

busco_downloads/

information/

lineages_list.2021-12-14.txt

lineages/

bacteria_odb10

actinobacteria_class_odb10

actinobacteria_phylum_odb10

placement_files/

list_of_reference_markers.archaea_odb10.2019-12-16.txt

Doing so, all your lineage files should be in busco_downloads/lineages/. When referring --download_path busco_downloads/ in your busco command line, it will know where to find the lineage dataset argument --lineage_dataset bacteria_odb10. If the busco_download directory is not in your working directory, you would need to provide full path.

Wget download command

6.2 Use wget download command. Here is one example:

All files must be decompressed: tar -xvf file.tar.gz

[name@server ~]$ mkdir -p busco_downloads/lineages
[name@server ~]$ cd busco_downloads/lineages
[name@server ~]$ wget https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2020-03-06.tar.gz
[name@server ~]$ tar -xvf bacteria_odb10.2020-03-06.tar.gz

4. Copiez le répertoire de configuration d’Augustus à un endroit accessible en écriture.

[name@server ~]$ cp -r $EBROOTAUGUSTUS/config $HOME/augustus_config

[name@server ~]$ wget https://gitlab.com/ezlab/busco/-/raw/master/test_data/bacteria/genome.fna

Dépannage

Message Cannot write to Augustus config path

Vérifiez que le fichier de configuration se trouve à un endroit accessible en écriture et que la variable AUGUSTUS_CONFIG_PATH a bien été définie.

Command to run a single genome:

[name@server ~]$ busco --offline --in genome.fna --out TEST --lineage_dataset bacteria_odb10 --mode genome --cpu ${SLURM_CPUS_PER_TASK:-1} --download_path busco_download/

Command to run multiple genomes that would be saved in the genome/ directory: (As describe here, genome folder would need to be in the current directory or you would need to provide the full path).

[name@server ~]$ busco --offline --in genome/ --out TEST --lineage_dataset bacteria_odb10 --mode genome --cpu ${SLURM_CPUS_PER_TASK:-1} --download_path busco_download/

Cette commande devrait prendre moins de 60 secondes. Les tâches dont la production est plus longue doivent être soumises à l'ordonnanceur.

Versions disponibles

Les versions récentes sont disponibles dans des wheels et la plus ancienne version dans un module (voir la section Modules ci-dessous).

Specify --in genome.fna for single file analysis,

Specify --in genome/ for multiple files analysis.

La version 3.0.2 est un module sur cvmfs et accessible sur toutes les grappes; les renseignements sur comment l'utiliser sont montrés ci-dessous. Il est possible d'installer localement les versions plus récentes en utilisant un environnement virtuel comme suit :

[name@server ~]$ ~ $ module load python/3.7.4
~ $ git clone https://gitlab.com/ezlab/busco.git
~ $ virtualenv /home/$USER/busco_env
~ $ source /home/$USER/busco_env/bin/activate
(busco_env) [~]$ pip install Biopython
(busco_env) [~]$ cd ~/busco
(busco_env) [~]$ python setup.py install
(busco_env) [~]$ cp -r scripts test_data /home/$USER/busco_env/

et ajoutez home/$USER/busco_env/scripts au chemin.

Job submission

Here you have an example of a submission script. You can submit as so: sbatch run_busco.sh.

File : run_busco.sh

#!/bin/bash

#SBATCH --job-name=busco9_run
#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=01:00:00           # adjust this to match the walltime of your job
#SBATCH --cpus-per-task=8         # adjust depending on the size of the genome(s)/protein(s)/transcriptome(s)
#SBATCH --mem=20G                 # adjust this according to the memory you need

# Load modules dependencies.
module load StdEnv/2020 gcc python augustus hmmer blast+ metaeuk prodigal r bbmap

# Generate your virtual environment in $SLURM_TMPDIR.
virtualenv --no-download ${SLURM_TMPDIR}/env
source ${SLURM_TMPDIR}/env/bin/activate

# Install busco and its dependencies.
pip install --no-index --upgrade pip
pip install --no-index --requirement ~/busco-requirements.txt

# Edit with the proper arguments, run your commands.
busco --offline --in genome.fna --out TEST --lineage_dataset bacteria_odb10 --mode genome --cpu ${SLURM_CPUS_PER_TASK:-1} --download_path busco_download/

Augustus parameters

9. For advanced users who want to use Augustus parameters: --augustus_parameters="--yourAugustusParameter".

Copy the Augustus config directory to a writable location:

[name@server ~]$ cp -r $EBROOTAUGUSTUS/config $HOME/augustus_config

Make sure to define the AUGUSTUS_CONFIG_PATH environment variable:

[name@server ~]$ export AUGUSTUS_CONFIG_PATH=$HOME/augustus_config

SEPP parameters

10. To use SEPP parameters, you need to install SEPP locally in your virtual environment. This should be done in a login node.

10.1. Activate your BUSCO virtual environment:

[name@server ~]$ source busco_env/bin/activate

10.2. Install dendropy:

[name@server ~]$ pip install 'dendropy<4.6'

10.3. Install SEPP:

[name@server ~]$ git clone https://github.com/smirarab/sepp.git
[name@server ~]$ cd sepp
[name@server ~]$ python setup.py config
[name@server ~]$ python setup.py install

10.4. Validate the installation:

[name@server ~]$ cd
[name@server ~]$ run_sepp.py -h

10.5. When using SEPP, because it is installed locally you cannot create the virtual environment as we have described in previous submission script demo. You simply need to add this command which activates your local virtual environment just after the loading module command line:

[name@server ~]$ source ~/busco_env/bin/activate

Modules

Deprecation

This section is outdated. We are currently working on updating it.

1. Chargez les modules nécessaires.

[name@server ~]$ module load StdEnv/2018.3 gcc/7.3.0 openmpi/3.1.4 busco/3.0.2 r/4.0.2

Ceci charge aussi les modules pour augustus, blast+, hmmer et d'autres paquets requis par BUSCO.

La commande run_BUSCO.py devrait prendre moins de 60 secondes. Les tâches dont la production est plus longue doivent être soumises à l'ordonnanceur.

Pour plus d’information, consultez le manuel d'utilisation.

File : partial_busco_config.ini

[tblastn]
# path to tblastn
path = /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx512/Compiler/gcc7.3/blast+/2.7.1/bin/

[makeblastdb]
# path to makeblastdb
path = /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx512/Compiler/gcc7.3/blast+/2.7.1/bin/

[augustus]
# path to augustus
path = /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx512/Compiler/gcc7.3/augustus/3.3/bin/

[etraining]
# path to augustus etraining
path = /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx512/Compiler/gcc7.3/augustus/3.3/bin/

# path to augustus perl scripts, redeclare it for each new script
[gff2gbSmallDNA.pl]
path = /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx512/Compiler/gcc7.3/augustus/3.3/scripts/
[new_species.pl]
path = /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx512/Compiler/gcc7.3/augustus/3.3/scripts/
[optimize_augustus.pl]
path = /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx512/Compiler/gcc7.3/augustus/3.3/scripts/

[hmmsearch]
# path to HMMsearch executable
path = /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx512/Compiler/gcc7.3/hmmer/3.1b2/bin/

[Rscript]
# path to Rscript, if you wish to use the plot tool
path = /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx512/Compiler/gcc7.3/r/4.0.2/bin/

Pour connaître la plus récente version, lancez

[name@server ~]$ avail_wheel busco

Wheels Python

Installation

1. Chargez les modules nécessaires.

[name@server ~]$ module load StdEnv/2020 gcc python/3.10
[name@server ~]$ module load python augustus hmmer blast+ metaeuk prodigal r

2. Créez l'environnement virtuel.

[name@server ~]$ virtualenv busco_env
[name@server ~]$ source busco_env/bin/activate

3. Installez le wheel et ses dépendances.

(busco_env) $ pip install biopython pandas busco --no-index

Utilisation

Ensembles de données

Avant de soumettre votre tâche, chargez les ensembles de données à partir de Index of /v5/data/.

Options pour votre script

Spécifiez --offline pour ne pas utiliser l'internet.

@@ Line 5: / Line 5: @@
 BUSCO (pour <i>Benchmarking Universal Single-Copy Orthologs</i>) est une application qui permet d'évaluer la complétude de l'assemblage et de l'annotation de génomes.
-<div class="mw-translate-fuzzy">
+Pour plus d'information, consultez [https://busco.ezlab.org/busco_userguide.html le manuel de l'utilisateur].
-== Modules ==
-</div>
 <div class="mw-translate-fuzzy">

BUSCO/fr: Difference between revisions

Revision as of 21:59, 23 November 2023

Contents

Test

Busco download command

Wget download command

Dépannage

Message Cannot write to Augustus config path

Versions disponibles

Job submission

Augustus parameters

SEPP parameters

Modules

Wheels Python

Installation

Utilisation

Ensembles de données

Options pour votre script

Navigation menu

BUSCO/fr: Difference between revisions

Revision as of 21:59, 23 November 2023

Test

Busco download command

Wget download command

Dépannage

Message Cannot write to Augustus config path

Versions disponibles

Job submission

Augustus parameters

SEPP parameters

Modules

Wheels Python

Installation

Utilisation

Ensembles de données

Options pour votre script

Navigation menu

Search