BLAST: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
(link to Genomics data)
Line 1: Line 1:
{{Draft}}
<languages />
 
<translate>


BLAST ("Basic Local Alignment Search Tool") finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.
BLAST ("Basic Local Alignment Search Tool") finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.


BLAST searches can be run over the Internet using the [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI site], but you '''should not do this''' for production work on a Compute Canada cluster.  Instead load the BLAST+ [[Utiliser des modules/en|module]] and a search database on the cluster.  (MORE TO COME on available databases and how to access them, as well as downloading and prepping your own database.)
BLAST searches can be run over the Internet using the [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI site], but you '''should not do this''' for production work on a Compute Canada cluster.  Instead load the BLAST+ [[Utiliser des modules/en|module]] and a search database on the cluster.   
 
Some frequently-used sequence databases are installed on Compute Canada clusters.  See [[Genomics data]].


== Performance ==
== Performance ==
Line 14: Line 18:
* Limit your hit list using evalue filters to near identical hits (<code>-evalue</code>), if it is reasonable for your research.
* Limit your hit list using evalue filters to near identical hits (<code>-evalue</code>), if it is reasonable for your research.


== References ==
</translate>

Revision as of 14:53, 8 January 2019

Other languages:


BLAST ("Basic Local Alignment Search Tool") finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.

BLAST searches can be run over the Internet using the NCBI site, but you should not do this for production work on a Compute Canada cluster. Instead load the BLAST+ module and a search database on the cluster.

Some frequently-used sequence databases are installed on Compute Canada clusters. See Genomics data.

Performance

Here are some things to try in order to accelerate your BLAST search on a computer cluster:

  • Copy your FASTA database to node-local storage ($SLURM_TMPDIR) and run makeblastdb at beginning of your job script to generate your blast db on ramdisk on the node.
  • Use multi-threading (option -num_threads). Beware that this is not very efficient; test to determine a suitable number of threads.
  • Lower the number of hits returned (-max_target_seqs, -max_hsps can help), if it is reasonable for your research.
  • Limit your hit list using evalue filters to near identical hits (-evalue), if it is reasonable for your research.