SHRiMP on Helix
SHRiMP is a software package for aligning genomic reads against a target genome. It was primarily developed with the multitudinous short reads of next generation sequencing machines in mind, as well as Applied Biosystem's colourspace genomic representation.
SHRiMP was originally designed and written by Michael Brudno and Stephen M. Rumble, with considerable input and testing by the SidowLab. Since then, Adrian Dalca, Marc Fiume and Vladimir Yanovsky have made considerable contributions to probability calculations and 2-pass SMS mapping algorithms. Matei David has joined the project with the 1.3.0 release. The latest version, 2.0, included significant contributions by Daniel Lister and Michael Dzamba. The authors may be contacted via e-mail at: shrimp at cs.toronto.edu.
How To Use
The following example takes about 30 minutes to finish on Helix using 2 threads (-N 2). If you have multiple runs (such as 4 runs) or longer runs (such as > 1 hours), apply for a Biowulf account (which comes free with helix account) and run your jobs on Biowulf cluster. See SHRiMP on biowulf.
The two sample input files below can be copied from /usr/local/src/shrimp/example:
See 'README' file below to estimate RAM requirement for your job.
% mkdir /data/user/shrimp/run1;cd /data/user/shrimp/run1 $ cp /usr/local/src/shrimp/example/* . $ /usr/local/shrimp/bin/gmapper-cs test_S1_F3.csfasta ch11_12_validated.fasta -N 2 -o 5 -h 80% >map.out 2>map.log
For this example, about 5-6 gb of memory is used.