BLAST
Jump to navigation
Jump to search
BLAST ("Basic Local Alignment Search Tool") finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.
BLAST searches can be run over the Internet using the NCBI site, but you should not do this for production work on a Compute Canada cluster. Instead load the BLAST+ module and a search database on the cluster.
Some frequently-used sequence databases are installed on Compute Canada clusters. See Genomics data.
Performance
Here are some things to try in order to accelerate your BLAST search on a computer cluster:
- Copy your FASTA database to node-local storage (
$SLURM_TMPDIR
) and runmakeblastdb
at beginning of your job script to generate your blast db on ramdisk on the node. - Use multi-threading (option
-num_threads
). Beware that this is not very efficient; test to determine a suitable number of threads. - Lower the number of hits returned (
-max_target_seqs, -max_hsps
can help), if it is reasonable for your research. - Limit your hit list using evalue filters to near identical hits (
-evalue
), if it is reasonable for your research.