BUSCO/en: Difference between revisions

Updating to match new version of source page
(Updating to match new version of source page)
(Updating to match new version of source page)
Line 3: Line 3:




BUSCO stands for <i>Benchmarking sets of Universal Single-Copy Orthologs</i>
BUSCO (<i>Benchmarking sets of Universal Single-Copy Orthologs</i>) is an application for assessing genome assembly and annotation completeness.
It is an application for assessing genome assembly and annotation completeness.


For more information, see the [https://busco.ezlab.org/busco_userguide.html user manual].
For more information, see the [https://busco.ezlab.org/busco_userguide.html user manual].


== Available versions ==
== Available versions ==
Recent versions are available as wheels. Older versions are available as a module, please see the module section below.
Recent versions are available as wheels. Older versions are available as a module, please see the [[#Modules|Modules]] section below.


To see the latest available version, run
To see the latest available version, run
{{Command|avail_wheel busco}}
{{Command|avail_wheel busco}}


== Python Wheel ==
== Python wheel ==
=== Installation ===
=== Installation ===
1. Load the necessary modules.
<b>1.</b> Load the necessary modules.
{{Commands
{{Commands
|module load StdEnv/2020 gcc python/3.10 augustus hmmer blast+ metaeuk prodigal r bbmap}}
|module load StdEnv/2020 gcc python/3.10 augustus hmmer blast+ metaeuk prodigal r bbmap}}


2. Create the virtual environment.
<b>2.</b> Create the virtual environment.
{{Commands
{{Commands
|virtualenv ~/busco_env
|virtualenv ~/busco_env
Line 26: Line 25:
}}
}}


3. Install the wheel and its dependencies.
<b>3.</b> Install the wheel and its dependencies.
{{Command
{{Command
|prompt=(busco_env) $
|prompt=(busco_env) $
Line 32: Line 31:
}}
}}


4. Validate it.
<b>4.</b> Validate it.
{{Command
{{Command
|prompt=(busco_env) $
|prompt=(busco_env) $
Line 38: Line 37:
}}
}}


'''5.''' Freeze the environment and requirements set. For requirements text file usage, have a look at the bash submission script described in point number 8.
<b>5.</b> Freeze the environment and requirements set. To use the requirements text file, see the <i>bash</i> submission script shown at point 8.
{{Command
{{Command
|prompt=(busco_env) $
|prompt=(busco_env) $
Line 46: Line 45:
=== Usage ===
=== Usage ===
==== Datasets ====
==== Datasets ====
'''6.''' You must pre-download any datasets from [https://busco-data.ezlab.org/v5/data/ busco data] before submitting your job.
<b>6.</b> You must pre-download any datasets from [https://busco-data.ezlab.org/v5/data/ BUSCO data] before submitting your job.


You can access the available datasets in your terminal by typing <code>busco --list-datasets</code>.
You can access the available datasets in your terminal by typing <code>busco --list-datasets</code>.


You have '''two''' options for datasets download:
You have <b>two</b> options to download datasets:<br>
 
*use the <code>busco</code> command,
===== Busco download command =====
*use the <code>wget</code> command.
'''6.1''' Use busco download command (preferred method). Here is one example:
 
Type this command in your working directory to download one particular dataset:


===== <b>6.1</b>  Using the <code>busco</code> command =====
This is the preferred option. Type this command in your working directory to download a particular dataset, for example
{{Commands
{{Commands
|busco --download bacteria_odb10
|busco --download bacteria_odb10
}}
}}


It is also possible to do a bulk download by using the following arguments in place of the dataset name: "all", "prokaryota", "eukaryota", or "virus".
It is also possible to do a bulk download by replacing the dataset name by the following arguments: <code>all</code>, <code>prokaryota</code>, <code>eukaryota</code>, or <code>virus</code>, for example


{{Commands
{{Commands
|busco --download virus
|busco --download virus
}}
}}
This will
::1. create a BUSCO directory hierarchy for the datasets,
::2. download the appropriate datasets,
::3. decompress the file(s),
::4. if you download multiple files, they will all be automatically added to the lineages directory.


This will:
The hierarchy will look like this:
 
::1. Create Busco directory hierarchy for datasets.
::2. Download the appropriate datasets.
::3. Decompress the file(s).
::4. If you download multiple files, they will all be automatically added in the lineages directory.
 
Directories hierarchy will look as follows:
 
<blockquote>
<blockquote>
* busco_downloads/
* busco_downloads/
Line 96: Line 91:
</blockquote>
</blockquote>


Doing so, all your lineage files should be in '''busco_downloads/lineages/'''. When referring <code>--download_path busco_downloads/</code> in your busco command line, it will know where to find the lineage dataset argument <code>--lineage_dataset bacteria_odb10</code>. If the busco_download directory is not in your working directory, you would need to provide full path.
Doing so, all your lineage files should be in <b>busco_downloads/lineages/</b>. When referring to <code>--download_path busco_downloads/</code> in the BUSCO command line, it will know where to find the lineage dataset argument <code>--lineage_dataset bacteria_odb10</code>. If the <i>busco_download </i> directory is not in your working directory, you will need to provide the full path.


=====Wget download command =====
=====<b>6.2</b> Using the <code>wget</code> command =====
'''6.2''' Use wget download command. Here is one example:


All files must be decompressed: <code>tar -xvf file.tar.gz</code>
All files must be decompressed with <code>tar -xvf file.tar.gz</code>.
{{Commands
{{Commands
|mkdir -p busco_downloads/lineages
|mkdir -p busco_downloads/lineages
Line 110: Line 104:


==== Test ====
==== Test ====
7. Download a genome file.
<b>7.</b> Download a genome file.


{{Commands
{{Commands
Line 116: Line 110:
}}
}}


'''8. Run.
<b>8.</b> Run.


Command to run a single genome:
Command to run a single genome:
Line 122: Line 116:
{{Command|busco --offline --in genome.fna --out TEST --lineage_dataset bacteria_odb10 --mode genome --cpu ${SLURM_CPUS_PER_TASK:-1} --download_path busco_download/}}
{{Command|busco --offline --in genome.fna --out TEST --lineage_dataset bacteria_odb10 --mode genome --cpu ${SLURM_CPUS_PER_TASK:-1} --download_path busco_download/}}


Command to run multiple genomes that would be saved in the '''genome/''' directory: (As describe here, genome folder would need to be in the current directory or you would need to provide the full path).
Command to run multiple genomes that would be saved in the <i>genome/</i> directory (in this example, genome folder would need to be in the current directory; otherwise, you need to provide the full path):


{{Command|busco --offline --in genome/ --out TEST --lineage_dataset bacteria_odb10 --mode genome --cpu ${SLURM_CPUS_PER_TASK:-1} --download_path busco_download/}}
{{Command|busco --offline --in genome/ --out TEST --lineage_dataset bacteria_odb10 --mode genome --cpu ${SLURM_CPUS_PER_TASK:-1} --download_path busco_download/}}
Line 128: Line 122:
The single genome command should take less than 60 seconds to complete. Production runs which take longer must be submitted to the [[Running jobs|scheduler]].
The single genome command should take less than 60 seconds to complete. Production runs which take longer must be submitted to the [[Running jobs|scheduler]].


===== Busco tips =====
===== BUSCO tips =====


Specify <code>--in genome.fna</code> for single file analysis,
Specify <code>--in genome.fna</code> for single file analysis.


Specify <code>--in genome/</code> for multiple files analysis.
Specify <code>--in genome/</code> for multiple files analysis.
Line 176: Line 170:


====Augustus parameters====
====Augustus parameters====
9. For advanced users who want to use Augustus parameters: <code>--augustus_parameters="--yourAugustusParameter".</code>
<b>9.</b> Advanced users may want to use Augustus parameters: <code>--augustus_parameters="--yourAugustusParameter"</code>.


Copy the Augustus config directory to a writable location:
*Copy the Augustus <code>config</code> directory to a writable location.
{{Command|cp -r $EBROOTAUGUSTUS/config $HOME/augustus_config}}
{{Command|cp -r $EBROOTAUGUSTUS/config $HOME/augustus_config}}


Make sure to define the <code>AUGUSTUS_CONFIG_PATH</code> environment variable:
*Make sure to define the <code>AUGUSTUS_CONFIG_PATH</code> environment variable.
{{Command|export AUGUSTUS_CONFIG_PATH{{=}}$HOME/augustus_config}}
{{Command|export AUGUSTUS_CONFIG_PATH{{=}}$HOME/augustus_config}}


====SEPP parameters====
====SEPP parameters====
10. To use SEPP parameters, you need to install SEPP locally in your virtual environment. This should be done in a login node.
<b>10.</b> To use SEPP parameters, you need to install SEPP locally in your virtual environment. This should be done in a login node.


10.1. Activate your BUSCO virtual environment:
<b>10.1.</b> Activate your BUSCO virtual environment.
{{Commands
{{Commands
|source busco_env/bin/activate
|source busco_env/bin/activate
}}
}}


10.2. Install dendropy:
<b>10.2.</b> Install DendroPy.
{{Commands
{{Commands
|pip install 'dendropy<4.6'
|pip install 'dendropy<4.6'
}}
}}


'''10.3.''' Install SEPP:
<b>10.3.</b> Install SEPP.
{{Commands
{{Commands
|git clone https://github.com/smirarab/sepp.git
|git clone https://github.com/smirarab/sepp.git
Line 205: Line 199:
}}
}}


'''10.4.''' Validate the installation:
<b>10.4.</b> Validate the installation.
{{Commands
{{Commands
|cd
|cd
Line 211: Line 205:
}}
}}


'''10.5.''' When using SEPP, because it is installed locally you cannot create the virtual environment as we have described in previous submission script demo. You simply need to add this command which activates your local virtual environment just after the loading module command line:
<b>10.5.</b> Because SEPP is installed locally, you cannot create the virtual environment as described in the previous submission script. To activate your local virtual environment, simply add the following command immediately under the line to load the module:
{{Commands
{{Commands
|source ~/busco_env/bin/activate
|source ~/busco_env/bin/activate
}}
}}


== Modules ==  
== Modules ==  
Line 221: Line 214:
{{Warning
{{Warning
|title=Deprecation
|title=Deprecation
|content=This section is outdated. We are currently working on updating it.
|content=This section is outdated. We are currently working on an update.
}}
}}


'''1.''' Load the necessary modules:
<b>1.</b> Load the necessary modules.
{{Command|module load StdEnv/2018.3 gcc/7.3.0 openmpi/3.1.4 busco/3.0.2 r/4.0.2}}
{{Command|module load StdEnv/2018.3 gcc/7.3.0 openmpi/3.1.4 busco/3.0.2 r/4.0.2}}
This will also load modules for <code>augustus, blast+, hmmer</code> and some other
This will also load modules for Augustus, BLAST+, HMMER and some other
software packages that BUSCO relies upon.
software packages that BUSCO relies upon.


'''2.''' Copy the configuration file:
<b>2.</b> Copy the configuration file.
{{Command|cp -v $EBROOTBUSCO/config/config.ini.default $HOME/busco_config.ini}}
{{Command|cp -v $EBROOTBUSCO/config/config.ini.default $HOME/busco_config.ini}}
or
or
{{Command|wget -O $HOME/busco_config.ini https://gitlab.com/ezlab/busco/raw/master/config/config.ini.default}}
{{Command|wget -O $HOME/busco_config.ini https://gitlab.com/ezlab/busco/raw/master/config/config.ini.default}}


'''3.''' Edit the configuration file. The locations of external tools are all specified in the last section, which is shown below:
<b>3.</b> Edit the configuration file. The locations of external tools are all specified in the last section, which is shown below:
{{File
{{File
   |name=partial_busco_config.ini
   |name=partial_busco_config.ini
Line 272: Line 265:
}}
}}


'''4.''' Copy the Augustus config directory to a writable location:
<b>4.</b> Copy the Augustus <code>config</code> directory to a writable location.
{{Command|cp -r $EBROOTAUGUSTUS/config $HOME/augustus_config}}
{{Command|cp -r $EBROOTAUGUSTUS/config $HOME/augustus_config}}


'''5.''' Check that it runs.
<b>5.</b> Check that it runs.


{{Commands
{{Commands
Line 288: Line 281:
= Troubleshooting =
= Troubleshooting =
== Cannot write to Augustus config path ==
== Cannot write to Augustus config path ==
Make sure you have copied the config directory to a writable location and exported the <code>AUGUSTUS_CONFIG_PATH</code> variable.
Make sure you have copied the <i>config</i> directory to a writable location and exported the <code>AUGUSTUS_CONFIG_PATH</code> variable.
38,760

edits