Sequence Format Converters on HelixThere are several programs available to convert nucleotide and protein sequences from one format to another.
- EMBOSS seqret. Can read and write sequences in GCG, EMBL, Swissprot, Fasta, Genbank, PIR, Clustal, Phylip, Staden, Raw and Plain sequences.
- Fmtseq: an extension of Don Gilbert's Readseq with a user-friendly interface.
- EMBOSS seqret (a web tool). Paste in a sequence or upload it from your local desktop system, select an output format, and click 'Run'.
- EMBOSS seqretsplit (a web tool). The same as seqret, except that when it writes out more than one sequence, it writes each sequence to an individual file. Its main use is therefore to split a file containing multiple sequences into many files, each containing one sequence.
How to useThe EMBOSS 'seqret' and 'seqretsplit' tools are available on both Helix and Biowulf. Typically, users will be reformatting sequences on Helix. If a large number of sequences needs to be reformatted as part of a Biowulf batch job, the EMBOSS commands can be inserted into a batch script.
Sample run with seqret on the command line, to convert a Genbank sequence into Swissprot format. (user input in bold)
helix% emboss (initializes EMBOSS) [...] helix% seqret Reads and writes (returns) sequences Input (gapped) sequence(s): a00006.gb_pat output sequence(s) [a00006.fasta]: swissprot::a00006.swiss helix% more a00006.swiss ID A00006 standard; DNA; UNC; 26 BP. SQ Sequence 26 BP; 5 A; 10 C; 8 G; 3 T; 0 other; CAGGCGCTCG ATCGATCGCG CCAACG 26 // helix%
Sample session with seqret to convert a GCG-format sequence into Fasta format.
helix% seqret Reads and writes (returns) sequences Input (gapped) sequence(s): nuc.gcg output sequence(s) [nuc.fasta]: helix%