Bureaucrats, cc_docs_admin, cc_staff
2,879
edits
(Created page with " Tips to accelerate your blast (https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download) search on computer cluster like Cedar: # try copying...") |
(add intro) |
||
Line 1: | Line 1: | ||
BLAST ("Basic Local Alignment Search Tool") finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance. | |||
BLAST searches can be run over the Internet using the [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI site], but you '''should not do this''' for production work on a Compute Canada cluster. Instead load the BLAST+ [[Utiliser des modules/en|module]] and a search database on the cluster. (MORE TO COME on available databases and how to access them, as well as downloading and prepping your own database.) | |||
== Performance == | |||
Here are some things to try in order to accelerate your BLAST search on a computer cluster: | |||
* Copy your FASTA database to node-local storage (<code>$SLURM_TMPDIR</code>) and run <code>makeblastdb</code> at beginning of your job script to generate your blast db on ramdisk on the node. | |||
* Use multi-threading (option <code>-num_threads</code>). Beware that this is not very efficient; test to determine a suitable number of threads. | |||
* Lower the number of hits returned (<code>-max_target_seqs, -max_hsps</code> can help), if it is reasonable for your research. | |||
* Limit your hit list using evalue filters to near identical hits (<code>-evalue</code>), if it is reasonable for your research. | |||
== References == |