Novoalign is an aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties whilst performing at the same or better speed than aligners that are limited to two mismatches and no insertions or deletions.
There are MPI versions of Novoalign and NovoalignCS available, but there is little point in running them on Helix. Novoalign and NovoalignCS can run using threads (i.e. using multiple cores/processors on a single system such as Helix), and the MPI versions provide no added benefit.
If you have multiple runs (such as 4 runs) or longer runs (such as > 1 hours), you should use Novocraft on Biowulf.
The paths for the Novocraft programs are set up using 'module load novocraft' as in the example below.
Novoalign indexes for some common genome assemblies such as hg18 and hg19 are available in /fdb/novoalign. If there are other genomes you want indexed, please email firstname.lastname@example.org
$ cd /data/username/mydir $ module load novocraft $ novoalign -c 4 -d /fdb/novoalign/chr_all_hg19.nbx -f read_1.fastq.gz read_2.fastq.gz -i 200,50
For this example, align Illumina paired-end reads to a reference genome. The expected size distribution for these sequencing runs were mean=200 and standard deviation = 50-c 4 tells the program to run with 4 threads. Do not use more than 8 threads on Helix for any Novocraft program.
Typing the name of a Novocraft program with no parameters will cause the usage summary to be printed on your screen. This is soetimes more up-to-date than the printed documentation. e.g.
helix% module load novocraft helix% novoalign Error: Missing reference database name (-d option). Novoalign V2.08.03 Usage: novoalign options Options: -d dbname Full pathname of indexed reference sequence from novoindex --mmapOff Turns off memory mapping for the index. By default the index file is memory mapped allowing it to be shared by multiple instances of Novoalign. --LockIdx Use MAP_LOCKED flag when memory mapping the index. Options for Read processing: -f read1 read2 Filenames for the read sequences for Side 1 & 2. If only one file is specified then single end reads are processed. If two files are specified then the program will operate in paired end mode. --hdrhd [9|off] Controls checking of identity between headers in paired end reads. Sets the Hamming Distance or disables the check. Default is a Hamming Distance of not more than 1. Processing will stop with appropriate error messages if Hamming Distance exceeds the limit. -F format Specifies a read file format, refer to manual for full list of options. For Fastq '_sequence.txt' files from Illumina CASAVA 1.3 to 1.7 use -F ILMFQ. CASAVA 1.8 and later use -F ILM1.8 Pre 1.3 use -F SLXFQ Sanger standard use -F STDFQ QSEQ & ILM1.8 files include reads that have been flagged as low quality by the base caller. Specify how these are processed with the following options: [...etc...]
PDF documentation available at http://www.novocraft.com/