Samtools
This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.
Description
Samtools is a suite of programs for interacting with high-throughput sequencing data. It consists of three separate repositories:
- Samtools: Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format
- BCFtools: Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants
- HTSlib: A C library for reading/writing high-throughput sequencing data
Samtools and BCFtools both use HTSlib internally, but these source packages contain their own copies of htslib so they can be built independently.
Note: This page does not cover all features of SAMtools. Please refer to Samtools for the complete list of all subtools.
Availability and Loading Module
To query all available version of SAMtools in our clusters:
[name@server ~]$ module spider samtools
You can load the version of your choice using module load. For example, to load samtools 1.12 use command
[name@server ~]$ module load samtools/1.12
To load the default version of samtools use command
[name@server ~]$ module load samtools
[name@server ~]$ samtools
Program: samtools (Tools for alignments in the SAM format)
Version: 1.12 (using htslib 1.12)
Usage: samtools <command> [options]
General usage
SAMtools provides various (sub)tools for manipulating alignments in the SAM/BAM format. While working with high-throughput sequencing data, you will probably have to deal with SAM/BAM file formats. The most common task would be to convert your SAM (Sequence Alignment/Map) files to BAM (binary form of SAM) files. BAM files are compressed versions of SAM files and are much smaller in size. BAM files are also easy to manipulate and are ideal for storing large nucleotide sequence alignments. In recent years, an even more compressed CRAM version has been widely preferred for storage.
Converting a SAM file to a BAM file
Prior to converting, verify if your files carry a header section with character “@”. You can inspect the header section using the view command:
[name@server ~]$ samtools view -H my_sample.sam
To convert a SAM file with a header section to a bam file:
[name@server ~]$ samtools view -bo my_sample.bam my_sample.sam
Alternatively, samtools also supports the following format:
[name@server ~]$ samtools view -b my_sample.sam -o my_sample.sam
If header files are absent you can use the reference fasta file used to map the reads
[name@server ~]$ samtools view -bt ref_seq.fa -o my_sample.bam my_sample.sam
Sorting and indexing BAM files
You may also have to sort and index bam files for many downstream applications
[name@server ~]$ samtools sort my_sample.bam -o my_sample_sorted.bam
[name@server ~]$ samtools index my_sample_sorted.bam
You can also convert SAM files directly to a sorted BAM file
With sorted bam files and its respective index file with extension .bai you are all set for any downstream process such as variant calling, feature counts etc.