High-Performance Computing at the NIH

RSS Feed
RepeatMasker on Helix
Repeatmasker screens DNA sequences for repetitive elements and low complexity sequences. A detailed annotation is produced that identifies all of the repetitive elements in a query sequence. RepeatMasker is commonly employed prior to searching a database because it produces a modified version of the query sequence in which all the annotated repeats and low complexity sequences have been masked (default: replaced by N's). Without RepeatMasker, database searches can provide misleading results because almost 50% of a human genomic DNA sequence consists of repetitive or low-complexity sequences. [Repeatmasker website]

Sequence comparisons in RepeatMasker are performed by Cross_Match. Comparisons are made to curated databases of repetitive element families derived from RepBase or RepBase Update.

Features:

Version

Type 'repeatmasker' with no parameters to see the current installed version of Repeatmasker, along with a brief help page.

Running Repeatmasker on Helix
Repeatmasker accepts input sequences in Fasta format only. Sequences in other formats can be converted using the EMBOSS 'seqret' function, as in the example below. At the Helix prompt, type repeatmasker with no parameters to get a brief help page. repeatmasker -help will print detailed help.

On Helix/Biowulf, RepeatMasker has been configured to have NCBI blast (i.e. RMBlast) as the default search engine. To use cross_match as the default search engine, the -e flag should be used.

    -e(ngine) [crossmatch|wublast|abblast|ncbi|hmmer|decypher]
        Use an alternate search engine to the default.

Note that wublast, abblast, hmmer and decypher are not configured as search engines on Helix/Biowulf. The only valid choices are

-e ncbi
-e crossmatch

Please contact staff@helix.nih.gov if you have a particular need for a different search engine.

Sample session: (user input in bold)

In this sample session, a sequence is obtained by using the EMBOSS seqret program. This sequence is then analyzed with RepeatMasker using the default (NCBI Blast) search engine, and then again using Crossmatch as the search engine.

	  
helix% emboss
[...]
[user@helix ~]$ seqret
Reads and writes (returns) sequences
Input (gapped) sequence(s): genbank:ay001401      
output sequence(s) [ay001401.fasta]: 
[user@helix ~]$ repeatmasker ay001401.fasta
RepeatMasker version open-4.0.0
Search Engine: NCBI/RMBLAST [ 2.2.27+ ]
Master RepeatMasker Database: /usr/local/apps/RepeatMasker/4.0.0/Libraries/RepeatMaskerLib.embl ( Complete Database: 20120418 )


Building species libraries in: /usr/local/apps/RepeatMasker/4.0.0/Libraries/20120418/homo_sapiens
   - 1860 ancestral and ubiquitous sequence(s) for homo sapiens
   - 9 lineage specific sequence(s) for homo sapiens

analyzing file ay001401.fasta

Checking for E. coli insertion elements
identifying Simple Repeats in batch 1 of 1
identifying full-length ALUs in batch 1 of 1
identifying full-length interspersed repeats in batch 1 of 1
identifying remaining ALUs in batch 1 of 1
identifying most interspersed repeats in batch 1 of 1
identifying long interspersed repeats in batch 1 of 1
identifying ancient repeats in batch 1 of 1
identifying retrovirus-like sequences in batch 1 of 1
identifying tough LINE1s in batch 1 of 1
identifying Simple Repeats in batch 1 of 1



No repetitive sequences were detected in ay001401.fasta

[user@helix ~]$ repeatmasker -e crossmatch ay001401.fasta
RepeatMasker version open-4.0.0
Search Engine: Crossmatch [ 1.090518 ]
Master RepeatMasker Database: /usr/local/apps/RepeatMasker/4.0.0/Libraries/RepeatMaskerLib.embl ( Complete Database: 20120418 )



analyzing file ay001401.fasta

Checking for E. coli insertion elements
identifying Simple Repeats in batch 1 of 1
identifying full-length ALUs in batch 1 of 1
identifying full-length interspersed repeats in batch 1 of 1
identifying remaining ALUs in batch 1 of 1
identifying most interspersed repeats in batch 1 of 1
identifying long interspersed repeats in batch 1 of 1
identifying ancient repeats in batch 1 of 1
identifying retrovirus-like sequences in batch 1 of 1
identifying tough LINE1s in batch 1 of 1
identifying Simple Repeats in batch 1 of 1



No repetitive sequences were detected in ay001401.fasta

helix%
Documentation
repeatmasker.help