BLAST: Difference between revisions

Jump to navigation Jump to search
add intro
(Created page with " Tips to accelerate your blast (https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download) search on computer cluster like Cedar: # try copying...")
 
(add intro)
Line 1: Line 1:


BLAST ("Basic Local Alignment Search Tool") finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.


Tips to accelerate your blast (https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download) search on computer cluster like Cedar:
BLAST searches can be run over the Internet using the [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI site], but you '''should not do this''' for production work on a Compute Canada cluster.  Instead load the BLAST+ [[Utiliser des modules/en|module]] and a search database on the cluster.  (MORE TO COME on available databases and how to access them, as well as downloading and prepping your own database.)


# try copying your blast database fasta to local node ramdisk (/tmp on cedar i think) and run makeblastdb at beginning of script on node to generate your blast db on ramdisk
== Performance ==
# use multi-threading even if not so effficient implementation in blast (option -num_threads)
 
# Lower number of hits returned if applicable (-max_target_seqs and also -max_hsps option can help)
Here are some things to try in order to accelerate your BLAST search on a computer cluster:
# Try also limiting your hit list using evalue filters (if applicable) to near identical hits (-evalue )
 
* Copy your FASTA database to node-local storage (<code>$SLURM_TMPDIR</code>) and run <code>makeblastdb</code> at beginning of your job script to generate your blast db on ramdisk on the node.
* Use multi-threading (option <code>-num_threads</code>).  Beware that this is not very efficient; test to determine a suitable number of threads.
* Lower the number of hits returned (<code>-max_target_seqs, -max_hsps</code> can help), if it is reasonable for your research.
* Limit your hit list using evalue filters to near identical hits (<code>-evalue</code>), if it is reasonable for your research.
 
== References ==
Bureaucrats, cc_docs_admin, cc_staff
2,879

edits

Navigation menu