Scientific Supercomputing at the NIH

SAMtools on Helix

SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments. SAM aims to be a format that:

- Is flexible enough to store all the alignment information generated by various alignment programs;
- Is simple enough to be easily generated by alignment programs or converted from existing alignment formats;
- Is compact in file size;
- Allows most of operations on the alignment to work on a stream without loading the whole alignment into memory;
- Allows the file to be indexed by genomic position to efficiently retrieve all reads aligning to a locus.

SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.

Version

Type '/usr/local/samtools/samtools' on commend line

Sample Session on Helix

SAMtools sample files can be copied from:

/usr/local/samtools/examples

Put these sample files under user's own area:

% cd /home/user/samtools/run1

% /usr/local/samtools/samtools faidx ex1.fa # index the reference FASTA

% /usr/local/samtools/samtools import ex1.fa.fai ex1.sam.gz ex1.bam # SAM->BAM

% /usr/local/samtools/samtools index ex1.bam # index BAM

% /usr/local/samtools/samtools tview ex1.bam ex1.fa # view alignment

% /usr/local/samtools/samtools pileup -cf ex1.fa ex1.bam # pileup and consensus

Documentation

http://samtools.sourceforge.net/