Scientific Supercomputing at the NIH

Genome Mapping and Assembly with MAQ on Helix

Maq stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences. Maq is a software that builds mapping assemblies from short reads generated by the next-generation sequencing machines. It is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data.

Maq first aligns reads to reference sequences and then calls the consensus. At the mapping stage, maq performs ungapped alignment. For single-end reads, maq is able to find all hits with up to 2 or 3 mismatches, depending on a command-line option; for paired-end reads, it always finds all paired hits with one of the two reads containing up to 1 mismatch. At the assembling stage, maq calls the consensus based on a statistical model. It calls the base which maximizes the posterior probability and calculates a phred quality at each position along the consensus. Heterozygotes are also called in this process.

Maq is a project hosted by SourceForge.net. The project page is available at http://sourceforge.net/projects/maq/.

Version

Type '/usr/local/maq/bin/maq.pl' on commend line.

Sample Session on Helix

DO NOT RUN MAQ ON BIOWULF HEAD NODE. RUN ON HELIX. MAQ sample files can be copied from:

/usr/local/maq/ref.fasta and /usr/local/maq/calib-36.dat.gz. 

Put these sample files under user's own area:

% cd /home/user/maq/run1

% gunzip calib-36.dat.gz

% /usr/local/maq/maq-0.7.1/bin/maq.pl demo ref.fasta calib-36.dat

Documentation

http://maq.sourceforge.net/index.shtml