cc_staff
20
edits
No edit summary |
No edit summary |
||
Line 68: | Line 68: | ||
With sorted bam files and its respective index file with extension .bai you are all set for any downstream process such as variant calling, feature counts etc. | With sorted bam files and its respective index file with extension .bai you are all set for any downstream process such as variant calling, feature counts etc. | ||
=== Processing multiple files with multithreading and/or GNU parallel === | |||
You likely have more than one file and a simple job submission script with a forward loop should automate the above command lines to process multiple files. | |||
{{File | |||
|name=samtools.sh | |||
|lang="bash" | |||
|contents= | |||
#!/bin/bash | |||
#SBATCH --account=def-prof_username | |||
#SBATCH --cpus-per-task 1 | |||
#SBATCH --mem-per-cpu=4G | |||
#SBATCH --time=0-3:00 | |||
#SBATCH --output=%x-%j.out | |||
module load samtools/1.12 | |||
for files in *.sam | |||
do | |||
time samtools view -b ${files} {{!}} samtools sort -o ${files%.*}_mt_sorted.bam | |||
done | |||
}} | |||
Samtools typically runs on a single core by default but it’s possible to use multithreading and GNU parallel to improve the overall efficiency of your pipeline. | |||
You can multithread your task and improve CPU efficiency using -@ flag. | |||
{{File | |||
|name=samtools_multithreading.sh | |||
|lang="bash" | |||
|contents= | |||
#!/bin/bash | |||
#SBATCH –account=def-prof_username | |||
#SBATCH --cpus-per-task 2 | |||
#SBATCH --mem-per-cpu=4G # memory; default unit is megabytes | |||
#SBATCH --time=0-3:00 # time (DD-HH:MM) | |||
#SBATCH --output=%x-%j.out | |||
module load samtools/1.12 | |||
for files in *.sam | |||
do | |||
time samtools view -@ 4 -b ${files} {{!}} samtools sort -o ${files%.*}_mt_sorted.bam | |||
done | |||
}} | |||
You can also implement GNU parallel to process multiple sam files concurrently. Please note that GNU parallel is available by default on Cedar, Graham, Narval and Beluga | |||
{{File | |||
|name=samtools_gnuparallel.sh | |||
|lang="bash" | |||
|contents= | |||
#!/bin/bash | |||
#SBATCH --account=cc-debug | |||
#SBATCH --cpus-per-task 4 | |||
#SBATCH --mem-per-cpu=4G # memory; default unit is megabytes | |||
#SBATCH --time=0-3:00 # time (DD-HH:MM) | |||
#SBATCH --output=%x-%j.out | |||
module load samtools/1.12 | |||
find . -name "*.sam" {{!}} parallel -j 4 "time samtools view -bS {} {{!}} samtools sort -o {.}_mt_sorted.bam" | |||
}} | |||
The above script will execute view and sort on four sam files concurrently. If you have more input files, modify the number of cores and <tt>-j</tt>. Please note that if you ignore the -j flag your job will run on all available cpu cores. |