Samtools: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
Line 31: Line 31:


Usage:  samtools <command> [options]}}
Usage:  samtools <command> [options]}}
== General usage == <!--T:3-->
SAMtools provides various (sub)tools for manipulating alignments in the SAM/BAM format. While working with high-throughput sequencing data, you will probably have to deal with SAM/BAM file formats. The most common task would be to convert your SAM (Sequence Alignment/Map) files to BAM (binary form of SAM) files. BAM files are compressed versions of SAM files and are much smaller in size. BAM files are also easy to manipulate and are ideal for storing large nucleotide sequence alignments. In recent years, an even more compressed CRAM version has been widely preferred for storage.
=== Converting a SAM file to a BAM file ===
Prior to converting, verify if your files carry a header section with character “@”. You can inspect the header section using the view command:
{{Command|samtools view -H my_sample.sam}}
To convert a SAM file with a header section to a bam file:
{{Command|samtools view -bo my_sample.bam my_sample.sam}}
Alternatively, samtools also supports the following format:
{{Command|samtools view -b my_sample.sam -o my_sample.sam}}
If header files are absent you can use the reference fasta file used to map the reads
{{Command|samtools view -bt ref_seq.fa -o my_sample.bam my_sample.sam}}
=== Sorting and indexing BAM files ===
You may also have to sort and index bam files for many downstream applications
{{Commands
|samtools sort my_sample.bam -o my_sample_sorted.bam
|samtools index my_sample_sorted.bam}}
You can also convert SAM files directly to a sorted BAM file
{{Commnd|samtools view -b my_sample.sam | samtools sort -o my_sample_sorted.bam}}
With sorted bam files and its respective index file with extension .bai you are all set for any downstream process such as variant calling, feature counts etc.

Revision as of 18:50, 9 November 2022


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.



Description

Samtools is a suite of programs for interacting with high-throughput sequencing data. It consists of three separate repositories:

  • Samtools: Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format
  • BCFtools: Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants
  • HTSlib: A C library for reading/writing high-throughput sequencing data

Samtools and BCFtools both use HTSlib internally, but these source packages contain their own copies of htslib so they can be built independently.

Note: This page does not cover all features of SAMtools. Please refer to Samtools for the complete list of all subtools.

Availability and Loading Module

To query all available version of SAMtools in our clusters:

Question.png
[name@server ~]$ module spider samtools

You can load the version of your choice using module load. For example, to load samtools 1.12 use command

Question.png
[name@server ~]$ module load samtools/1.12

To load the default version of samtools use command

[name@server ~]$ module load samtools
[name@server ~]$ samtools

Program: samtools (Tools for alignments in the SAM format)
Version: 1.12 (using htslib 1.12)

Usage:   samtools <command> [options]

General usage

SAMtools provides various (sub)tools for manipulating alignments in the SAM/BAM format. While working with high-throughput sequencing data, you will probably have to deal with SAM/BAM file formats. The most common task would be to convert your SAM (Sequence Alignment/Map) files to BAM (binary form of SAM) files. BAM files are compressed versions of SAM files and are much smaller in size. BAM files are also easy to manipulate and are ideal for storing large nucleotide sequence alignments. In recent years, an even more compressed CRAM version has been widely preferred for storage.

Converting a SAM file to a BAM file

Prior to converting, verify if your files carry a header section with character “@”. You can inspect the header section using the view command:

Question.png
[name@server ~]$ samtools view -H my_sample.sam

To convert a SAM file with a header section to a bam file:

Question.png
[name@server ~]$ samtools view -bo my_sample.bam my_sample.sam

Alternatively, samtools also supports the following format:

Question.png
[name@server ~]$ samtools view -b my_sample.sam -o my_sample.sam

If header files are absent you can use the reference fasta file used to map the reads

Question.png
[name@server ~]$ samtools view -bt ref_seq.fa -o my_sample.bam my_sample.sam

Sorting and indexing BAM files

You may also have to sort and index bam files for many downstream applications

[name@server ~]$ samtools sort my_sample.bam -o my_sample_sorted.bam
[name@server ~]$ samtools index my_sample_sorted.bam


You can also convert SAM files directly to a sorted BAM file

Template:Commnd

With sorted bam files and its respective index file with extension .bai you are all set for any downstream process such as variant calling, feature counts etc.