BWA is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome. By default, BWA finds an alignment within edit distance 2 to the query sequence, except for disallowing gaps close to the end of the query. It can also be tuned to find a fraction of longer gaps at the cost of speed and of more false alignments.
BWA excels in its speed. Mapping 2 million high-quality 35bp short reads against the human genome can be done in 20 minutes. Usually the speed is gained at the cost of huge memory, disallowing gaps and/or the hard limits on the maximum read length and the maximum mismatches. BWA does not. It is still relatively light-weighted (2.3GB memory for human alignment), performs gapped alignment, and does not set a hard limit on read length or maximum mismatches.
Given a database file in FASTA format, BWA first builds BWT index with the 'index' command. The alignments in suffix array (SA) coordinates are then generated with the 'aln' command. The resulting file contains ALL the alignments found by BWA. The 'samse/sampe' command converts SA coordinates to chromosomal coordinates. For single-end reads, most of computing time is spent on finding the SA coordinates (the aln command). For paired-end reads, half of computing time may be spent on pairing (the sampe command) given 32bp reads. Using longer reads would reduce the fraction of time spent on pairing because each end in a pair would be mapped to fewer places.
How to Use
There are multiple versions of BWA available. An easy way of selecting the version is to use modules. To see the modules available, typemodule avail bwa
To select a module, typemodule load bwa/[ver]
where [ver] is the version of choice. This will set your $PATH variable.
Pre-build BWA index files are available in/fdb/igenomes/[organism]/[source]/[build]/Sequence/BWAIndex/genome.fa
- [organism] is the specific organism of interest (Gallus_gallus, Rattus_norvegicus, etc.)
- [source] is the source for the sequence (NCBI, Ensembl, UCSC)
- [build] is the specific genome draft of interest (hg19, build37.2, GRCh37)
Some users have noticed that newer versions of BWA don't work with index files from previous versions in /fdb/bwa/indexes. Please use the index files above under /fdb/igenomes instead.
BWA sample files can be copied from:/usr/local/src/bwa/sample
Put these sample files under user's own area:$ cd /home/user/bwa/run1 $ module load bwa; bwa index -a bwtsw tttF3.csfasta $ bwa aln tttF3.csfasta ttt.fastq > ttt.sai $ bwa samse tttF3.csfasta ttt.sai ttt.fastq > ttt.sam
To see a full listing of the options available for bwa, type bwa at the prompt.$ bwa Program: bwa (alignment via Burrows-Wheeler transformation) Version: 0.6.2-r126 Contact: Heng Li <firstname.lastname@example.org> Usage: bwa
[options] Command: index index sequences in the FASTA format aln gapped/ungapped alignment samse generate alignment (single ended) sampe generate alignment (paired ended) bwasw BWA-SW for long queries fastmap identify super-maximal exact matches fa2pac convert FASTA to PAC format pac2bwt generate BWT from PAC pac2bwtgen alternative algorithm for generating BWT bwtupdate update .bwt to the new format bwt2sa generate SA from BWT and Occ pac2cspac convert PAC to color-space PAC stdsw standard SW/NW alignment