Sequence comparisons in RepeatMasker are performed by Cross_Match. Comparisons are made to curated databases of repetitive element families derived from RepBase or RepBase Update.
Features:
- Screens DNA sequences for repetitive elements including small RNA pseudogenes, Alus, LINEs, SINEs, LTR elements, and others.
- Produces a table annotating the masked sequences and a table that identifies families of repetitive elements in the query sequence.
- Mask repetitive and low-complexity sequences prior to database searches.
- Use any size query sequence.
- Helpful in designing primers or oligonucleotide probes from sequence data.
- Test for primate or rodent DNA contamination.
- Remove the sequence of an E. coli transposon or insertion elements from a DNA sequence.
- Limit masking to low complexity DNA, Alus, interspersed repeats, or non-RNA sequences.
- Set an upper limit for the level of divergence of a match in order to restrict masking to young insertion elements.
Type 'repeatmasker' with no parameters to see the current installed version of Repeatmasker, along with a brief help page.
On Helix/Biowulf, RepeatMasker has been configured to have NCBI blast (i.e. RMBlast) as the default search engine. To use cross_match as the default search engine, the -e flag should be used.
-e(ngine) [crossmatch|wublast|abblast|ncbi|hmmer|decypher]
Use an alternate search engine to the default.
Note that wublast, abblast, hmmer and decypher are not configured as search engines on Helix/Biowulf. The only valid choices are
-e ncbi -e crossmatch
Please contact staff@helix.nih.gov if you have a particular need for a different search engine.
In this sample session, a sequence is obtained by using the EMBOSS seqret program. This sequence is then analyzed with RepeatMasker using the default (NCBI Blast) search engine, and then again using Crossmatch as the search engine.
helix% emboss [...] [user@helix ~]$ seqret Reads and writes (returns) sequences Input (gapped) sequence(s): genbank:ay001401 output sequence(s) [ay001401.fasta]: [user@helix ~]$ repeatmasker ay001401.fasta RepeatMasker version open-4.0.0 Search Engine: NCBI/RMBLAST [ 2.2.27+ ] Master RepeatMasker Database: /usr/local/apps/RepeatMasker/4.0.0/Libraries/RepeatMaskerLib.embl ( Complete Database: 20120418 ) Building species libraries in: /usr/local/apps/RepeatMasker/4.0.0/Libraries/20120418/homo_sapiens - 1860 ancestral and ubiquitous sequence(s) for homo sapiens - 9 lineage specific sequence(s) for homo sapiens analyzing file ay001401.fasta Checking for E. coli insertion elements identifying Simple Repeats in batch 1 of 1 identifying full-length ALUs in batch 1 of 1 identifying full-length interspersed repeats in batch 1 of 1 identifying remaining ALUs in batch 1 of 1 identifying most interspersed repeats in batch 1 of 1 identifying long interspersed repeats in batch 1 of 1 identifying ancient repeats in batch 1 of 1 identifying retrovirus-like sequences in batch 1 of 1 identifying tough LINE1s in batch 1 of 1 identifying Simple Repeats in batch 1 of 1 No repetitive sequences were detected in ay001401.fasta [user@helix ~]$ repeatmasker -e crossmatch ay001401.fasta RepeatMasker version open-4.0.0 Search Engine: Crossmatch [ 1.090518 ] Master RepeatMasker Database: /usr/local/apps/RepeatMasker/4.0.0/Libraries/RepeatMaskerLib.embl ( Complete Database: 20120418 ) analyzing file ay001401.fasta Checking for E. coli insertion elements identifying Simple Repeats in batch 1 of 1 identifying full-length ALUs in batch 1 of 1 identifying full-length interspersed repeats in batch 1 of 1 identifying remaining ALUs in batch 1 of 1 identifying most interspersed repeats in batch 1 of 1 identifying long interspersed repeats in batch 1 of 1 identifying ancient repeats in batch 1 of 1 identifying retrovirus-like sequences in batch 1 of 1 identifying tough LINE1s in batch 1 of 1 identifying Simple Repeats in batch 1 of 1 No repetitive sequences were detected in ay001401.fasta helix%

