Scientific Supercomputing at the NIH

G-Mo.R-Se on Helix

G-Mo.R-Se is a method aimed at using RNA-Seq short reads to build de novo gene models.
First, candidate exons are built directly from the positions of the reads mapped on the genome (without any ab initio assembly of the reads), and all the possible splice junctions between those exons are tested against unmapped reads : the testing of junctions is directed by the information available in the RNA-Seq dataset rather than a prior knowledge about the genome. Exons can thus be chained into stranded gene models.

Programs location

/usr/local/gmorse/bin

Version

Beta

How To Use

Example files used below can be downloaded from /usr/local/gmorse/example:

% cp -r /usr/local/gmorse/example /home/users/gmorse/

Example

First use soap to create index and map reads:

% /usr/local/soap/2bwt-builder genome.test.fa

% /usr/local/soap/soap -a 'reads.fastq' -D genome.test.fa.index -o reads.soap

Extract unmapped reads:

% /usr/local/gmorse/bin/extractNonMappedReads reads.soap reads.fastq > unmapped.fa

Calculate coverage:

% mkdir OUT;/usr/local/gmorse/bin/coverage reads.soap > OUT/test.coverage

Build covtigs:

% /usr/local/gmorse/bin/build_covtigs OUT/test.coverage 4 > OUT/test.covtigs.before_extension

Covtigs extension and fusion:

% /usr/local/gmorse/bin/extend_covtigs OUT/test.covtigs.before_extension genome.test.fa unmapped.fa OUT/test.covtigs

Junction validation between covtigs:

% /usr/local/gmorse/bin/gmorse -r unmapped.fa -c OUT/test.covtigs -f genome.test.fa -v > OUT/test.junctionsVAL

Build models from validated junctions:

% /usr/local/gmorse/bin/build_models OUT/test.junctionsVAL chr12:10115000-10123000 OUT/test.models.beforefusion.gff

Fusion of models using the ORF information:

% /usr/local/gmorse/bin/fuse_models OUT/test.models.beforefusion.gff genome.test.fa OUT/test.models.gff

Documentation

http://helix.nih.gov/Applications/gmorse.txt