Samtools: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 68: Line 68:


With sorted bam files and its respective index file with extension .bai you are all set for any downstream process such as variant calling, feature counts etc.
With sorted bam files and its respective index file with extension .bai you are all set for any downstream process such as variant calling, feature counts etc.
=== Processing multiple files with multithreading and/or GNU parallel ===
You likely have more than one file and a simple job submission script with a forward loop should automate the above command lines to process multiple files.
{{File
|name=samtools.sh
|lang="bash"
|contents=
#!/bin/bash
#SBATCH --account=def-prof_username             
#SBATCH --cpus-per-task 1
#SBATCH --mem-per-cpu=4G     
#SBATCH --time=0-3:00
#SBATCH --output=%x-%j.out
module load samtools/1.12
for files in *.sam
do
time samtools view -b ${files} {{!}} samtools sort -o ${files%.*}_mt_sorted.bam
done
}}
Samtools typically runs on a single core by default but it’s possible to use multithreading and GNU parallel to improve the overall efficiency of your pipeline.
You can multithread your task and improve CPU efficiency using -@ flag.
{{File
|name=samtools_multithreading.sh
|lang="bash"
|contents=
#!/bin/bash
#SBATCH –account=def-prof_username
#SBATCH --cpus-per-task 2
#SBATCH --mem-per-cpu=4G  # memory; default unit is megabytes
#SBATCH --time=0-3:00      # time (DD-HH:MM)
#SBATCH --output=%x-%j.out
module load samtools/1.12
for files in *.sam
do
time samtools view -@ 4 -b ${files} {{!}} samtools sort -o ${files%.*}_mt_sorted.bam
done
}}
You can also implement GNU parallel to process multiple sam files concurrently. Please note that GNU parallel is available by default on Cedar, Graham, Narval and Beluga
{{File
|name=samtools_gnuparallel.sh
|lang="bash"
|contents=
#!/bin/bash
#SBATCH --account=cc-debug
#SBATCH --cpus-per-task 4
#SBATCH --mem-per-cpu=4G  # memory; default unit is megabytes
#SBATCH --time=0-3:00      # time (DD-HH:MM)
#SBATCH --output=%x-%j.out
module load samtools/1.12
find . -name "*.sam" {{!}} parallel -j 4 "time samtools view -bS {} {{!}} samtools sort -o {.}_mt_sorted.bam"
}}
The above script will execute view and sort on four sam files concurrently. If you have more input files, modify the number of cores and <tt>-j</tt>. Please note that if you ignore the -j flag your job will run on all available cpu cores.
cc_staff
20

edits

Navigation menu