38,760
edits
(Updating to match new version of source page) |
(Updating to match new version of source page) |
||
Line 3: | Line 3: | ||
BUSCO | BUSCO (<i>Benchmarking sets of Universal Single-Copy Orthologs</i>) is an application for assessing genome assembly and annotation completeness. | ||
For more information, see the [https://busco.ezlab.org/busco_userguide.html user manual]. | For more information, see the [https://busco.ezlab.org/busco_userguide.html user manual]. | ||
== Available versions == | == Available versions == | ||
Recent versions are available as wheels. Older versions are available as a module, please see the | Recent versions are available as wheels. Older versions are available as a module, please see the [[#Modules|Modules]] section below. | ||
To see the latest available version, run | To see the latest available version, run | ||
{{Command|avail_wheel busco}} | {{Command|avail_wheel busco}} | ||
== Python | == Python wheel == | ||
=== Installation === | === Installation === | ||
1. Load the necessary modules. | <b>1.</b> Load the necessary modules. | ||
{{Commands | {{Commands | ||
|module load StdEnv/2020 gcc python/3.10 augustus hmmer blast+ metaeuk prodigal r bbmap}} | |module load StdEnv/2020 gcc python/3.10 augustus hmmer blast+ metaeuk prodigal r bbmap}} | ||
2. Create the virtual environment. | <b>2.</b> Create the virtual environment. | ||
{{Commands | {{Commands | ||
|virtualenv ~/busco_env | |virtualenv ~/busco_env | ||
Line 26: | Line 25: | ||
}} | }} | ||
3. Install the wheel and its dependencies. | <b>3.</b> Install the wheel and its dependencies. | ||
{{Command | {{Command | ||
|prompt=(busco_env) $ | |prompt=(busco_env) $ | ||
Line 32: | Line 31: | ||
}} | }} | ||
4. Validate it. | <b>4.</b> Validate it. | ||
{{Command | {{Command | ||
|prompt=(busco_env) $ | |prompt=(busco_env) $ | ||
Line 38: | Line 37: | ||
}} | }} | ||
<b>5.</b> Freeze the environment and requirements set. To use the requirements text file, see the <i>bash</i> submission script shown at point 8. | |||
{{Command | {{Command | ||
|prompt=(busco_env) $ | |prompt=(busco_env) $ | ||
Line 46: | Line 45: | ||
=== Usage === | === Usage === | ||
==== Datasets ==== | ==== Datasets ==== | ||
<b>6.</b> You must pre-download any datasets from [https://busco-data.ezlab.org/v5/data/ BUSCO data] before submitting your job. | |||
You can access the available datasets in your terminal by typing <code>busco --list-datasets</code>. | You can access the available datasets in your terminal by typing <code>busco --list-datasets</code>. | ||
You have | You have <b>two</b> options to download datasets:<br> | ||
*use the <code>busco</code> command, | |||
*use the <code>wget</code> command. | |||
===== <b>6.1</b> Using the <code>busco</code> command ===== | |||
This is the preferred option. Type this command in your working directory to download a particular dataset, for example | |||
{{Commands | {{Commands | ||
|busco --download bacteria_odb10 | |busco --download bacteria_odb10 | ||
}} | }} | ||
It is also possible to do a bulk download by | It is also possible to do a bulk download by replacing the dataset name by the following arguments: <code>all</code>, <code>prokaryota</code>, <code>eukaryota</code>, or <code>virus</code>, for example | ||
{{Commands | {{Commands | ||
|busco --download virus | |busco --download virus | ||
}} | }} | ||
This will | |||
::1. create a BUSCO directory hierarchy for the datasets, | |||
::2. download the appropriate datasets, | |||
::3. decompress the file(s), | |||
::4. if you download multiple files, they will all be automatically added to the lineages directory. | |||
The hierarchy will look like this: | |||
<blockquote> | <blockquote> | ||
* busco_downloads/ | * busco_downloads/ | ||
Line 96: | Line 91: | ||
</blockquote> | </blockquote> | ||
Doing so, all your lineage files should be in | Doing so, all your lineage files should be in <b>busco_downloads/lineages/</b>. When referring to <code>--download_path busco_downloads/</code> in the BUSCO command line, it will know where to find the lineage dataset argument <code>--lineage_dataset bacteria_odb10</code>. If the <i>busco_download </i> directory is not in your working directory, you will need to provide the full path. | ||
===== | =====<b>6.2</b> Using the <code>wget</code> command ===== | ||
All files must be decompressed | All files must be decompressed with <code>tar -xvf file.tar.gz</code>. | ||
{{Commands | {{Commands | ||
|mkdir -p busco_downloads/lineages | |mkdir -p busco_downloads/lineages | ||
Line 110: | Line 104: | ||
==== Test ==== | ==== Test ==== | ||
7. Download a genome file. | <b>7.</b> Download a genome file. | ||
{{Commands | {{Commands | ||
Line 116: | Line 110: | ||
}} | }} | ||
<b>8.</b> Run. | |||
Command to run a single genome: | Command to run a single genome: | ||
Line 122: | Line 116: | ||
{{Command|busco --offline --in genome.fna --out TEST --lineage_dataset bacteria_odb10 --mode genome --cpu ${SLURM_CPUS_PER_TASK:-1} --download_path busco_download/}} | {{Command|busco --offline --in genome.fna --out TEST --lineage_dataset bacteria_odb10 --mode genome --cpu ${SLURM_CPUS_PER_TASK:-1} --download_path busco_download/}} | ||
Command to run multiple genomes that would be saved in the | Command to run multiple genomes that would be saved in the <i>genome/</i> directory (in this example, genome folder would need to be in the current directory; otherwise, you need to provide the full path): | ||
{{Command|busco --offline --in genome/ --out TEST --lineage_dataset bacteria_odb10 --mode genome --cpu ${SLURM_CPUS_PER_TASK:-1} --download_path busco_download/}} | {{Command|busco --offline --in genome/ --out TEST --lineage_dataset bacteria_odb10 --mode genome --cpu ${SLURM_CPUS_PER_TASK:-1} --download_path busco_download/}} | ||
Line 128: | Line 122: | ||
The single genome command should take less than 60 seconds to complete. Production runs which take longer must be submitted to the [[Running jobs|scheduler]]. | The single genome command should take less than 60 seconds to complete. Production runs which take longer must be submitted to the [[Running jobs|scheduler]]. | ||
===== | ===== BUSCO tips ===== | ||
Specify <code>--in genome.fna</code> for single file analysis | Specify <code>--in genome.fna</code> for single file analysis. | ||
Specify <code>--in genome/</code> for multiple files analysis. | Specify <code>--in genome/</code> for multiple files analysis. | ||
Line 176: | Line 170: | ||
====Augustus parameters==== | ====Augustus parameters==== | ||
9. | <b>9.</b> Advanced users may want to use Augustus parameters: <code>--augustus_parameters="--yourAugustusParameter"</code>. | ||
Copy the Augustus config directory to a writable location | *Copy the Augustus <code>config</code> directory to a writable location. | ||
{{Command|cp -r $EBROOTAUGUSTUS/config $HOME/augustus_config}} | {{Command|cp -r $EBROOTAUGUSTUS/config $HOME/augustus_config}} | ||
Make sure to define the <code>AUGUSTUS_CONFIG_PATH</code> environment variable | *Make sure to define the <code>AUGUSTUS_CONFIG_PATH</code> environment variable. | ||
{{Command|export AUGUSTUS_CONFIG_PATH{{=}}$HOME/augustus_config}} | {{Command|export AUGUSTUS_CONFIG_PATH{{=}}$HOME/augustus_config}} | ||
====SEPP parameters==== | ====SEPP parameters==== | ||
10. To use SEPP parameters, you need to install SEPP locally in your virtual environment. This should be done in a login node. | <b>10.</b> To use SEPP parameters, you need to install SEPP locally in your virtual environment. This should be done in a login node. | ||
10.1. Activate your BUSCO virtual environment | <b>10.1.</b> Activate your BUSCO virtual environment. | ||
{{Commands | {{Commands | ||
|source busco_env/bin/activate | |source busco_env/bin/activate | ||
}} | }} | ||
10.2. Install | <b>10.2.</b> Install DendroPy. | ||
{{Commands | {{Commands | ||
|pip install 'dendropy<4.6' | |pip install 'dendropy<4.6' | ||
}} | }} | ||
<b>10.3.</b> Install SEPP. | |||
{{Commands | {{Commands | ||
|git clone https://github.com/smirarab/sepp.git | |git clone https://github.com/smirarab/sepp.git | ||
Line 205: | Line 199: | ||
}} | }} | ||
<b>10.4.</b> Validate the installation. | |||
{{Commands | {{Commands | ||
|cd | |cd | ||
Line 211: | Line 205: | ||
}} | }} | ||
<b>10.5.</b> Because SEPP is installed locally, you cannot create the virtual environment as described in the previous submission script. To activate your local virtual environment, simply add the following command immediately under the line to load the module: | |||
{{Commands | {{Commands | ||
|source ~/busco_env/bin/activate | |source ~/busco_env/bin/activate | ||
}} | }} | ||
== Modules == | == Modules == | ||
Line 221: | Line 214: | ||
{{Warning | {{Warning | ||
|title=Deprecation | |title=Deprecation | ||
|content=This section is outdated. We are currently working on | |content=This section is outdated. We are currently working on an update. | ||
}} | }} | ||
<b>1.</b> Load the necessary modules. | |||
{{Command|module load StdEnv/2018.3 gcc/7.3.0 openmpi/3.1.4 busco/3.0.2 r/4.0.2}} | {{Command|module load StdEnv/2018.3 gcc/7.3.0 openmpi/3.1.4 busco/3.0.2 r/4.0.2}} | ||
This will also load modules for | This will also load modules for Augustus, BLAST+, HMMER and some other | ||
software packages that BUSCO relies upon. | software packages that BUSCO relies upon. | ||
<b>2.</b> Copy the configuration file. | |||
{{Command|cp -v $EBROOTBUSCO/config/config.ini.default $HOME/busco_config.ini}} | {{Command|cp -v $EBROOTBUSCO/config/config.ini.default $HOME/busco_config.ini}} | ||
or | or | ||
{{Command|wget -O $HOME/busco_config.ini https://gitlab.com/ezlab/busco/raw/master/config/config.ini.default}} | {{Command|wget -O $HOME/busco_config.ini https://gitlab.com/ezlab/busco/raw/master/config/config.ini.default}} | ||
<b>3.</b> Edit the configuration file. The locations of external tools are all specified in the last section, which is shown below: | |||
{{File | {{File | ||
|name=partial_busco_config.ini | |name=partial_busco_config.ini | ||
Line 272: | Line 265: | ||
}} | }} | ||
<b>4.</b> Copy the Augustus <code>config</code> directory to a writable location. | |||
{{Command|cp -r $EBROOTAUGUSTUS/config $HOME/augustus_config}} | {{Command|cp -r $EBROOTAUGUSTUS/config $HOME/augustus_config}} | ||
<b>5.</b> Check that it runs. | |||
{{Commands | {{Commands | ||
Line 288: | Line 281: | ||
= Troubleshooting = | = Troubleshooting = | ||
== Cannot write to Augustus config path == | == Cannot write to Augustus config path == | ||
Make sure you have copied the config directory to a writable location and exported the <code>AUGUSTUS_CONFIG_PATH</code> variable. | Make sure you have copied the <i>config</i> directory to a writable location and exported the <code>AUGUSTUS_CONFIG_PATH</code> variable. |