BLAST: Difference between revisions

m
Use faSplit instead
m (Update to lastest BLAST+)
m (Use faSplit instead)
Line 47: Line 47:
==== Preprocessing ==== <!--T:16-->
==== Preprocessing ==== <!--T:16-->
In order to accelerate the search, the <tt>seq.fa</tt> file must be split into smaller chunks. These should be at least <tt>1MB</tt> or greater, but '''not smaller''' as it may hurt the parallel filesystem.
In order to accelerate the search, the <tt>seq.fa</tt> file must be split into smaller chunks. These should be at least <tt>1MB</tt> or greater, but '''not smaller''' as it may hurt the parallel filesystem.
<!--T:17-->
'''Important''': To correctly split a FASTA format file, it must be in its original format and not in multiline format. In other words, the sequence must be on a single line.


<!--T:18-->
<!--T:18-->
Using the <tt>split</tt> utility:
Using the <tt>faSplit</tt> utility:
{{Command|split -d -a 1 -l 2 seq.fa seq.fa.}}
{{Command|module load kentutils/20180716}}
will create 10 files named <tt>seq.fa.N</tt> where <tt>N</tt> is in the range of <tt>[0..9]</tt> for 10 queries (sequences).
{{Command|faSplit sequence seqs.fa 10 seq}}
will create 10 files named <tt>seqN.fa</tt> where <tt>N</tt> is in the range of <tt>[0..9]</tt> for 10 queries (sequences).


==== Job submission ==== <!--T:19-->
==== Job submission ==== <!--T:19-->
Line 102: Line 100:
See also [[GNU Parallel#Handling_large_files|Handling large files]] in the GNU Parallel page.
See also [[GNU Parallel#Handling_large_files|Handling large files]] in the GNU Parallel page.


==== Running with multiple cores on one node====
<!--T:34-->
<!--T:34-->
{{File
{{File
Line 137: Line 136:
Note: The file must not be compressed.
Note: The file must not be compressed.


==== Job submission ==== <!--T:31-->
===== Job submission ===== <!--T:31-->
With the above submission script, we can submit our search and it will run after the database has been created.
With the above submission script, we can submit our search and it will run after the database has been created.
{{Command|sbatch --dependency{{=}}afterok:$(sbatch makeblastdb.sh) blastn_gnu.sh}}
{{Command|sbatch --dependency{{=}}afterok:$(sbatch makeblastdb.sh) blastn_gnu.sh}}
==== Running with multiple cores on one node====


=== Additional tips === <!--T:32-->
=== Additional tips === <!--T:32-->
cc_staff
284

edits