High-Performance Computing at the NIH

RSS Feed
Scientific Databases
Sorted by format /database /type

Jump to [EMBOSS] [Blast] [Fasta] [Mascot] [PDB] [CSD] [PFAM] [MySQL] [vcf files] [BAM]

Database Type Location on the Helix Systems Last Updated

EMBOSS databases

Accessible via See: EMBOSS web interface
EMBOSS command-line
EST
EST division of Genbank
Nuc /fdb/embossdb/est.new 18 Apr 2013

(Updated bimonthly after Genbank release
Source:NCBI )

Gb_New
All sequences added to Genbank since last major release
Nuc /fdb/embossdb/gbnew.new 23 May 2013

(Updated daily
Source:NCBI )

Genbank
The NIH Genetic Sequence Database, an annotated collection of all publicly available DNA sequences. More information at NCBI.
Nuc /fdb/embossdb/genbank.new 17 Apr 2013

(Updated bimonthly after Genbank release
Source:NCBI )

Refseqnt
NCBI's comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA) for major research organisms.
Nuc /fdb/embossdb/refseqnt.new 08 May 2013

(Updated weekly
Source:NCBI)

Prints
Protein fingerprints, groups of conserved motifs used to characterize a protein family.
Patterns used internally by Emboss 22 Apr 2013

(Updated after new Prints release
Source:EBI)

Prosite
A database/dictionary of protein sites and patterns. More information at Expasy.
Patterns used internally by Emboss 15 May 2013

(Updated every 2 months
Source:Expasy )

REBASE
About restriction enzymes, recognition sequences, cleavage sites... More information at REBASE.
Enzymes used internally by Emboss 22 May 2013

(Updated every month
Source:REBASE )

GenPept
GenPept is produced by parsing the corresponding GenBank release for translated coding regions of GenBank sequences. More information at NCI, Frederick
Prot /fdb/embossdb/genpept.new 12 Dec 2012

(Updated bimonthly after Genbank release
Source:NCIFCRF)

GP_New
All sequences added to GenPept since last major release
Prot /fdb/embossdb/gpnew.new 02 Oct 2012

(Updated daily
Source:NCIFCRF )

Refseqaa
NCBI's comprehensive, integrated, non-redundant set of protein sequences for major research organisms.
Prot /fdb/embossdb/refseqaa.new 08 May 2013

(Updated weekly
Source:NCBI)

UniProt
(Swissprot + Trembl) A highly-annotated, curated protein sequence database. Minimal redundancy and high level of integration with other databases. More information at Expasy
Prot /fdb/embossdb/uniprot 01 May 2013

(Updated weekly
Source:Uniprot)

Blast databases

Accessible via See: Blast (Helix)
Blast (Biowulf)
Drosophila
Drosophila sequences
Nuc /fdb/blastdb/drosoph.nt 26 Sep 2011

(Updated weekly
Source:NCBI )

EST - human
Human sequences from the EST division of Genbank
Nuc /fdb/blastdb/est_human 23 May 2012

(Updated weekly
Source:NCBI )

EST - mouse
Mouse sequences from the EST division of Genbank.
Nuc /fdb/blastdb/est_mouse 23 May 2012

(Updated weekly
Source:NCBI )

EST - others
Non-human, non-mouse sequences from the EST division of Genbank
Nuc /fdb/blastdb/est_others 23 May 2012

(Updated weekly
Source:NCBI )

HTGs
High throughput genome sequences
Nuc /fdb/blastdb/htgs 21 Apr 2013

(Updated weekly
Source:NCBI )

Human Genome hg18
Build 36, hg18 (Apr 2006) from the International Human Genome Consortium
Nuc /fdb/genome/human-apr2006/hs_genome 20 May 2011

(Updated after new build release
Source:UCSC )

Human Genome hg19
Build 37, hg19 (Feb 2009) from the International Human Genome Consortium
Nuc /fdb/blastdb/hs_genome 02 May 2013

(Updated after new build release
Source:UCSC )

Human Genome RNA hg18
Build 36, hg18 (Apr 2006) from the International Human Genome Consortium
Nuc /fdb/genome/human-apr2006/hs_genome.rna 28 Apr 2006

(Updated after build release
Source:NCBI )

Human Genome RNA hg19
Build 37, hg19 (Feb 2009) from the International Human Genome Consortium
Nuc /fdb/blastdb/hs_genome.rna 05 Nov 2012

(Updated after build release
Source:NCBI )

Mito
Mitochondrial sequences
Nuc /fdb/blastdb/mito.nt 20 May 2013

(Updated weekly
Source:NCBI )

Mouse Genome mm8
Build 36, mm8, Mar 2006 from the Mouse Genome Consortium
Nuc /fdb/genome/mouse-mar2006/mouse_genome 09 Nov 2006

(Updated after new build release
Source:UCSC )

Mouse Genome mm9
Build 37, mm9, Jul 2007 from the Mouse Genome Consortium
Nuc /fdb/blastdb/mouse_genome 25 Mar 2008

(Updated after new build release
Source:UCSC )

Mouse Genome RNA mm8
Build 36, mm8, Mar 2006 from the Mouse Genome Consortium
Nuc /fdb/genome/mouse-mar2006/mouse_genome.rna 09 Nov 2006

(Updated after release
Source:NCBI )

Mouse Genome RNA mm9
Build 37, mm9, Jul 2007 from the Mouse Genome Consortium
Nuc /fdb/blastdb/mouse_genome.rna 22 Oct 2012

(Updated after release
Source:NCBI )

NCBI nt
All GenBank+EMBL+DDBJ (but no EST, STS, GSS, HTG). No longer nonredundant.
Nuc /fdb/blastdb/nt 15 May 2013

(Updated weekly
Source:NCBI )

Protein Data Bank
An archive of experimentally determined three-dimensional strtures of biological macromolecules. More information at the PDB.
Nuc /fdb/blastdb/pdbnt 16 May 2013

(Updated weekly
Source:NCBI )

Refseq Human Genomic
Refseq Human (NC_######) chromosome records with gap adjusted concatenated NT_ contigs
Nuc /fdb/blastdb/human_genomic 18 May 2013

(Updated weekly
Source:NCBI)

Refseq Human RNA
A comprehensive, integrated, non-redundant set of sequences. More info at NCBI
Nuc /fdb/blastdb/human.rna 20 May 2013

(Updated weekly
Source:NCBI)

Refseq Mouse RNA
A comprehensive, integrated, non-redundant set of sequences. More info at NCBI
Nuc /fdb/blastdb/mouse.rna 20 May 2013

(Updated weekly
Source:NCBI)

Refseq Other Genomic
RefSeq chromosome records (NC_######) for organisms other than human
Nuc /fdb/blastdb/other_genomic 28 Feb 2013

(Updated weekly
Source:NCBI )

Yeast
Yeast sequences
Nuc /fdb/blastdb/yeast.nt 26 Sep 2011

(Updated weekly
Source:NCBI )

Drosophila
Drosophila sequences
Prot /fdb/blastdb/drosoph.aa 26 Sep 2011

(Updated weekly
Source:NCBI )

Human Genome Proteins hg18
Build 36, hg18 (Apr 2006) from the International Human Genome Consortium
Prot /fdb/genome/human-apr2006/hs_genome.protein 28 Apr 2006

(Updated after build release
Source:NCBI )

Human Genome Proteins hg18
Build 36, hg18 (Apr 2006) from the International Human Genome Consortium
Prot /fdb/genome/human-apr2006/hs_genome.protein 28 Apr 2006

(Updated after build release
Source:NCBI )

Human Genome Proteins hg19
Build 37, hg19 (Feb 2009) from the International Human Genome Consortium
Prot /fdb/blastdb/hs_genome.protein 05 Nov 2012

(Updated after build release
Source:NCBI )

Mito
Mitochondrial sequences
Prot /fdb/blastdb/mito.aa 20 May 2013

(Updated weekly
Source:NCBI )

Mouse Genome Proteins mm8
Build 36, mm8, Mar 2006 from the Mouse Genome Consortium
Prot /fdb/genome/mouse-mar2006/mouse_genome.protein 09 Nov 2006

(Updated weekly
Source:NCBI )

Mouse Genome Proteins mm9
Build 37, mm9, Jul 2007 from the Mouse Genome Consortium
Prot /fdb/genome/mouse-mar2006/mouse_genome.protein 09 Nov 2006

(Updated weekly
Source:NCBI )

NCBI nr
NCBI's nonredundant Genbank CDS translations + PDB + SwissProt
Prot /fdb/blastdb/nr 25 Apr 2013

(Updated weekly
Source:NCBI )

Protein Data Bank
An archive of experimentally determined three-dimensional strtures of biological macromolecules. More information at the PDB.
Prot /fdb/blastdb/pdbaa 16 May 2013

(Updated weekly
Source:NCBI )

Refseq Human Proteins
A comprehensive, integrated, non-redundant set of sequences. More info at NCBI
Prot /fdb/blastdb/human.protein 20 May 2013

(Updated weekly
Source:NCBI)

Refseq Mouse Proteins
A comprehensive, integrated, non-redundant set of sequences. More info at NCBI
Prot /fdb/blastdb/mouse.protein 20 May 2013

(Updated weekly
Source:NCBI)

SwissProt
A highly-annotated, curated protein sequence database. Minimal redundancy and high level of integration with other databases. More information at Expasy
Prot /fdb/blastdb/swissprot 20 May 2013

(Updated weekly
Source:NCBI)

Yeast
Yeast sequences
Prot /fdb/blastdb/yeast.aa 26 Sep 2011

(Updated weekly
Source:NCBI )

Fasta databases

Accessible via See: Fasta, BLAT.
Drosophila
Drosophila sequences
Prot /fdb/fastadb/drosoph.aa.fas 04 Sep 2012

(Updated weekly
Source:NCBI )

Human Genome Proteins hg19
Build 37, hg19 (Feb 2009) from the International Human Genome Consortium
Prot /fdb/fastadb/hs_genome.protein.fas 12 Apr 2010

(Updated after build release
Source:NCBI )

Mito
Mitochondrial sequences
Prot /fdb/fastadb/mito.aa.fas 21 May 2013

(Updated weekly
Source:NCBI )

Mouse Genome Proteins mm8
Build 36, mm8, Mar 2006 from the Mouse Genome Consortium
Prot /fdb/genome/mouse-mar2006/mouse_genome.protein.fas 09 Nov 2006

(Updated weekly
Source:NCBI )

Mouse Genome Proteins mm9
Build 37, mm9, Jul 2007 from the Mouse Genome Consortium
Prot /fdb/fastadb/mouse_genome.protein.fas 25 Mar 2008

(Updated weekly
Source:NCBI )

NCBI nr
NCBI's nonredundant Genbank CDS translations + PDB + SwissProt
Prot /fdb/fastadb/nr.aa.fas 21 May 2013

(Updated weekly
Source:NCBI )

Protein Data Bank
An archive of experimentally determined three-dimensional strtures of biological macromolecules. More information at the PDB.
Prot /fdb/fastadb/pdb.aa.fas 21 May 2013

(Updated weekly
Source:NCBI )

Refseq Human Proteins
A comprehensive, integrated, non-redundant set of sequences. More info at NCBI
Prot /fdb/fastadb/ref.human.protein.fas 21 May 2013

(Updated weekly
Source:NCBI)

Refseq Mouse Proteins
A comprehensive, integrated, non-redundant set of sequences. More info at NCBI
Prot /fdb/fastadb/ref.mouse.protein.fas 21 May 2013

(Updated weekly
Source:NCBI)

SwissProt
A highly-annotated, curated protein sequence database. Minimal redundancy and high level of integration with other databases. More information at Expasy
Prot /fdb/fastadb/swissprot.aa.fas 21 May 2013

(Updated weekly
Source:NCBI )

Yeast
Yeast sequences
Prot /fdb/fastadb/yeast.aa.fas 30 Jun 2011

(Updated weekly
Source:NCBI )

Drosophila
Drosophila sequences
Nuc /fdb/fastadb/drosoph.nt.fas 04 Sep 2012

(Updated weekly
Source:NCBI )

EST - human
Human sequences from the EST division of Genbank.
Nuc /fdb/fastadb/est_human.fas 21 May 2013

(Updated weekly
Source:NCBI )

EST - mouse
Mouse sequences from the EST division of Genbank.
Nuc /fdb/fastadb/est_mouse.fas 21 May 2013

(Updated weekly
Source:NCBI )

Human Genome hg18
Build 36, hg18 (Apr 2006) from the International Human Genome Consortium
Nuc /fdb/genome/human-apr2006/ 20 May 2011

(Updated after new build release
Source:UCSC)

Human Genome hg19
Build 37, hg19 (Feb 2009) from the International Human Genome Consortium
Nuc /fdb/genome/human-feb2009/ 02 May 2013

(Updated after new build release
Source:UCSC)

Human Genome RNA hg18
Build 36, hg18 (Apr 2006) from the International Human Genome Consortium
Nuc /fdb/genome/human-apr2006/hs_genome.rna.fas 28 Apr 2006

(Updated after build release
Source:NCBI )

Human Genome RNA hg19
Build 37, hg19 (Feb 2009) from the International Human Genome Consortium
Nuc /fdb/fastadb/hs_genome.rna.fas 12 Apr 2010

(Updated after build release
Source:NCBI )

Mito
Mitochondrial sequences
Nuc /fdb/fastadb/mito.nt.fas 21 May 2013

(Updated weekly
Source:NCBI )

Mouse Genome mm8
Build 36, mm8, Mar 2006 from the Mouse Genome Consortium
Nuc /fdb/genome/mouse-mar2006/ 08 Jul 2010

(Updated after new build release
Source:UCSC )

Mouse Genome mm9
Build 37, mm9, Jul 2007 from the Mouse Genome Consortium
Nuc /fdb/genome/mouse-jul2007/ 06 Apr 2011

(Updated after new build release
Source:UCSC )

Mouse Genome RNA mm9
Build 37, mm9, Jul 2007 from the Mouse Genome Consortium
Nuc /fdb/fastadb/mouse_genome.rna.fas 25 Mar 2008

(Updated after release
Source:NCBI )

NCBI nt
All GenBank+EMBL+DDBJ (but no EST, STS, GSS, HTG). No longer nonredundant.
Nuc /fdb/fastadb/nt.fas 21 May 2013

(Updated weekly
Source:NCBI )

Protein Data Bank
An archive of experimentally determined three-dimensional strtures of biological macromolecules. More information at the PDB.
Nuc /fdb/fastadb/pdb.nt.fas 21 May 2013

(Updated weekly
Source:NCBI )

Refseq Human Genomic
Refseq Human (NC_######) chromosome records with gap adjusted concatenated NT_ contigs
Nuc /fdb/fastadb/ref.human.genomic.fas 21 May 2013

(Updated weekly
Source:NCBI)

Refseq Human RNA
A comprehensive, integrated, non-redundant set of sequences. More info at NCBI
Nuc /fdb/fastadb/ref.human.rna.fas 21 May 2013

(Updated weekly
Source:NCBI)

Refseq Mouse RNA
A comprehensive, integrated, non-redundant set of sequences. More info at NCBI
Nuc /fdb/fastadb/ref.mouse.rna.fas 21 May 2013

(Updated weekly
Source:NCBI)

Refseq Other Genomic
RefSeq chromosome records (NC_######) for organisms other than human
Nuc /fdb/fastadb/ref.other.genomic.fas 21 May 2013

(Updated weekly
Source:NCBI)

Yeast
Yeast sequences
Nuc /fdb/fastadb/yeast.nt.fas 04 Sep 2012

(Updated weekly
Source:NCBI )

Mascot databases

Accessible via Mascot search engine
MSDB
A nonredundant protein sequence database designed specifically for mass-spec applications.
Prot biospec.nih.gov 01 Jun 2010

(Updated weekly
Source:NCBI )

NCBI nr
NCBI's nonredundant Genbank CDS translations + PDB + SwissProt
Prot biospec.nih.gov 19 May 2013

(Updated weekly
Source:NCBI )

NIH-Specific
A collection of NIH-specific databases requested by NIH Mascot users.
Prot biospec.nih.gov 15 May 2013

(Updated as requested
Source:NIH)

Sp_Trembl
SwissProt + Trembl (a computer-annotated supplement of SwissProt)
Prot biospec.nih.gov 05 May 2013

(Updated weekly
Source:Expasy )

SwissProt
A highly-annotated, curated protein sequence database. Minimal redundancy and high level of integration with other databases. More information at Expasy
Prot biospec.nih.gov 05 May 2013

(Updated weekly
Source:Expasy )

PDB databases

Accessible via Molecules R Us or direct access to coordinate files.
NIH users can NFS-mount the PDB databases on their own machines -- contact staff@helix.nih.gov for more info.
Protein Data Bank
An archive of experimentally determined three-dimensional strtures of biological macromolecules. More information at the PDB.
3-D /pdb/pdb 24 May 2013

(Updated daily
Source:PDB )

CSD databases

Accessible via Quest
Cambridge Structural Database
Crystal structure information for over 165,000 organic and organometallic compounds. More info at CCDC.
3-D /local/csd 16 Jan 2013

(Updated every 3 months
Source:CCDC)

PFAM databases

Accessible via HMMER (Biowulf, Helix)
PFAM
A collection of multiple sequence alignments and hidden Markov models. More information at PFAM home page
Families /fdb/fastadb/pfam 23 Mar 2009

(Updated every 3 months
Source:PFAM )

MySQL databases

Accessible via Also available for direct MySQL queries from the Biowulf cluster nodes.
Chicken Genome
May 2006 assembly from WUSTL.
Nuc NIH mirror of UCSC Genome Browser 05 Apr 2013

(Updated weekly
Source:UCSC )

Cow Genome
Mar 2005 assembly from the Baylor Sequencing Center
Nuc NIH mirror of UCSC genome browser 31 Jan 2012

(Updated weekly
Source:UCSC )

Dog Genome
May 2005 assembly from the Broad Institute
Nuc NIH mirror of UCSC genome browser 25 Jan 2013

(Updated weekly
Source:UCSC )

Human Genome hg18
Build 36, hg18 (Apr 2006) from the International Human Genome Consortium
Nuc NIH mirror of UCSC Genome Browser 17 May 2013

(Updated weekly
Source:UCSC )

Human Genome hg19
Build 37, hg19 (Feb 2009) from the International Human Genome Consortium
Nuc NIH mirror of UCSC Genome Browser 17 May 2013

(Updated weekly
Source:UCSC )

Mouse Genome mm8
Build 36, mm8, Mar 2006 from the Mouse Genome Consortium
Nuc NIH mirror of UCSC Genome Browser 17 May 2013

(Updated weekly
Source:UCSC )

Mouse Genome mm9
Build 37, mm9, Jul 2007 from the Mouse Genome Consortium
Nuc NIH mirror of UCSC Genome Browser 17 May 2013

(Updated weekly
Source:UCSC )

Rat Genome
May 2006 build, rn4, from the Rat Genome Sequencing Consortium
Nuc NIH mirror of UCSC genome browser 10 May 2013

(Updated weekly
Source:UCSC )

Rhesus genome
Jan 2006 assembly from the Baylor Sequencing Center.
Nuc NIH mirror of UCSC genome browser 25 Jan 2013

(Updated weekly
Source:UCSC )

Zebrafish genome
Mar 2006 assembly from the Sanger Center.
Nuc NIH mirror of UCSC genome browser 29 Nov 2011

(Updated weekly
Source:UCSC )

Chicken Genome
May 2006 assembly from WUSTL.
Annotations NIH mirror of UCSC Genome Browser 05 Apr 2013

(Updated weekly
Source:UCSC )

Cow Genome
Mar 2005 assembly from the Baylor Sequencing Center
Annotations NIH mirror of UCSC genome browser 31 Jan 2012

(Updated weekly
Source:UCSC )

Dog Genome
May 2005 assembly from the Broad Institute
Annotations NIH mirror of UCSC genome browser 25 Jan 2013

(Updated weekly
Source:UCSC )

Drosophila genome
April 2006 assembly
Annotations NIH mirror of UCSC genome browser 10 May 2013

(Updated weekly
Source:UCSC)

Human Genome hg18
Build 36, hg18 (Apr 2006) from the International Human Genome Consortium
Annotations NIH mirror of UCSC Genome Browser 17 May 2013

(Updated weekly
Source:UCSC)

Human Genome hg19
Build 37, hg19 (Feb 2009) from the International Human Genome Consortium
Annotations NIH mirror of UCSC Genome Browser 17 May 2013

(Updated weekly
Source:UCSC)

Mouse Genome mm8
Build 36, mm8, Mar 2006 from the Mouse Genome Consortium
Annotations NIH mirror of UCSC Genome Browser 17 May 2013

(Updated weekly
Source:UCSC )

Mouse Genome mm9
Build 37, mm9, Jul 2007 from the Mouse Genome Consortium
Annotations NIH mirror of UCSC Genome Browser 17 May 2013

(Updated weekly
Source:UCSC )

Rat Genome
May 2006 build, rn4, from the Rat Genome Sequencing Consortium
Annotations NIH mirror of UCSC genome browser 10 May 2013

(Updated weekly
Source:UCSC )

Rhesus genome
Jan 2006 assembly from the Baylor Sequencing Center.
Annotations NIH mirror of UCSC genome browser 25 Jan 2013

(Updated weekly
Source:UCSC )

Zebrafish genome
Mar 2006 assembly from the Sanger Center.
Annotations NIH mirror of UCSC genome browser 29 Nov 2011

(Updated weekly
Source:UCSC )

Chicken Genome
May 2006 assembly from WUSTL.
Prot NIH mirror of UCSC Genome Browser 05 Apr 2013

(Updated weekly
Source:UCSC )

Cow Genome
Aug 2006 assembly from the Baylor Sequencing Center
Prot NIH mirror of UCSC genome browser 31 Jan 2012

(Updated weekly
Source:UCSC )

Dog Genome
May 2005 assembly from the Broad Institute
Prot NIH mirror of UCSC genome browser 25 Jan 2013

(Updated weekly
Source:UCSC )

Human Genome hg18
Build 36, hg18 (Apr 2006) from the International Human Genome Consortium
Prot NIH mirror of UCSC Genome Browser 17 May 2013

(Updated weekly
Source:UCSC )

Human Genome hg19
Build 37, hg19 (Feb 2009) from the International Human Genome Consortium
Prot NIH mirror of UCSC Genome Browser 17 May 2013

(Updated weekly
Source:UCSC )

Mouse Genome mm8
Build 36, mm8, Mar 2006 from the Mouse Genome Consortium
Prot NIH mirror of UCSC Genome Browser 17 May 2013

(Updated weekly
Source:UCSC )

Mouse Genome mm9
Build 37, mm9, Jul 2007 from the Mouse Genome Consortium
Prot NIH mirror of UCSC Genome Browser 17 May 2013

(Updated weekly
Source:UCSC )

Rat Genome
May 2006 build, rn4, from the Rat Genome Sequencing Consortium
Prot NIH mirror of UCSC genome browser 10 May 2013

(Updated weekly
Source:UCSC )

Rhesus genome
Jan 2006 assembly from the Baylor Sequencing Center.
Prot NIH mirror of UCSC genome browser 25 Jan 2013

(Updated weekly
Source:UCSC )

Zebrafish genome
Mar 2006 assembly from the Sanger Center.
Prot NIH mirror of UCSC genome browser 29 Nov 2011

(Updated weekly
Source:UCSC )

vcf files databases

Accessible via
1000 Genomes
20100804 release containing analysis results sets (vcfs) and README files.
Nuc /fdb/1000genomes/ 01 Apr 2013

(Updated occasionally
Source:NCBI)

BAM databases

Accessible via
1000 Genomes
20100804 release containing analysis results sets (vcfs) and README files.
Alignment /fdb/1000genomes/ftp/data/ 20 May 2013

(Updated occasionally
Source:NCBI)