|The applications below are labelled according to
the platform on which they are available. The platforms include:
Helix [Helix] is the front-end machine for the Helix Systems at NIH, and is to be used for interactive and general computational tasks.
The NIH Biowulf cluster [Biowulf] is a Linux parallel processing system that is for computationally challenging and batch jobs. Most applications that run on Helix can also run on the Biowulf cluster. Helix users are required to register for a Biowulf account in order to use the cluster.
Web-based applications [Web]: The Helix Systems supports and develops a variety of web-based tools which are available to the intramural NIH community. Some tools require NIH authentication.
Helix is the recommended system for interactive use, short or small numbers of jobs. Users who need to run computationally intensive or memory intensive jobs should apply for a Biowulf account. If you are unsure about which system is best suited to your needs or applications, contact the Helix staff at email@example.com.
Functional annotation of genetic variants from high-throughput sequencing data.
Blast is a sequence database searching program which compares a nucleotide or protein query sequence against all sequences in a database.
BLAT is a DNA/Protein Sequence Analysis program that is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. Available on Biowulf.
Clustal-Omega is a general purpose multiple sequence alignment (MSA) program for proteins and DNA/RNA. It produces high quality MSAs and is capable of handling data-sets of hundreds of thousands of sequences in reasonable time.
ClustalW is a general-purpose multiple alignment program for DNA or protein sequences.
DNAWorks (3.2.2) [Web]
DNAWorks is a computer program that automates the design of oligonucleotides for gene synthesis by PCR-based gene assembly. The program requires simple input information: an amino acid sequence of the target protein or a DNA sequence, and a desired annealing temperature.
EMBOSS (The European Molecular Biology Open Software Suite) is a nucleotide/protein sequence analysis package specially developed for the needs of the molecular biology user community. See also Equivalent tools for GCG programs.
The fasta program package contains many programs for searching DNA and protein databases and one program (prss) for evaluating statistical significance from randomly shuffled sequences.
HMMER uses profile Hidden Markov models to perform sensitive database searching using statistical descriptions of a sequence family's consensus.
LASTZ (1.02) [Helix]
LASTZ is a tool for (1) aligning two DNA sequences, and (2) inferring appropriate scoring parameters automatically. LASTZ is a drop-in replacement for BLASTZ, and is backward compatible with BLASTZ's command-line syntax.
Discover motifs in groups of DNA/protein sequences or databases.
MFOLD predicts DNA and RNA secondary structure.
MUMmer (3.23) [Helix]
A system for aligning entire genomes extremely rapidly.
NCBI C++ Toolkit (12.0.0) [Helix & Biowulf]
A collection of executables and libraries from the NCBI C++ Toolkit have been compiled for Helix and Biowulf.
NestedMICA is a method for discovering over-represented short motifs in large sets of strings. Typical applications include finding candidate transcription factor binding sites in DNA sequences.
A parallel implementation for Multifactor dimensionality reduction (MDR) for detecting gene-gene and gene-environment interactions.
PolyPhen-2 (Polymorphism Phenotyping v2) is a software tool which predicts possible impact of amino acid substitutions on the structure and function of human proteins using straightforward physical and evolutionary comparative considerations.
PRANK is a probabilistic multiple alignment program for DNA, codon and amino-acid sequences. PRANK is based on a novel algorithm that treats insertions correctly and avoids over-estimation of the number of deletion events.
RandFold computes the probability that, for a given RNA sequence, the Minimum Free Energy (MFE) of the secondary structure is different from a distribution of MFE computed with random sequences.
Screens DNA sequences of repetitive elements and returns a masked query sequence ready for database searches as well as a table annotating the masked regions
Seaview (4.4.0) [Helix]
is a graphical multiple sequence alignment editor. It is able to read and write various alignment formats (NEXUS, MSF, CLUSTAL, FASTA, PHYLIP, MASE). It allows one to manually edit the alignment, and also to run DOT-PLOT or CLUSTALW/MUSCLE programs to locally improve the alignment.
Sequence Format Converters [Helix & Biowulf]
These programs convert sequence data from one format to another.
SIFT predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids.
snpEff (3.3) [Helix]
snpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of variants on genes (such as amino acid changes).
UCSC Executables [Helix & Biowulf]
A collection of executables from UCSC have been compiled on Biowulf. The programs perform a multitude of tasks from simple number crunching to highly specific sequence analysis and database construction.
VEP (73) [Biowulf]
VEP (Variant Effect Predictor) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.back to top
Note 1. Most applications which run on Helix can also run on Biowulf cluster. Feel free to drop us a line if the corresponding biowulf webpage link does not exist and you want to know how to run on Biowulf.
Note 2. A large variety of scientific databases are maintained in several formats on Helix/Biowulf, including parts of the 1000 Genomes data, human, mouse, and other genomes, and NCBI nonredundant protein and nucleotide databases. See here for a full list and update status.
align2rawsignal reads in a set of tagAlign/BAM files, filters out multi-mapping tags and creates a consolidated genome-wide signal file using variou s tag-shift and smoothing parameters as well as various normalization schemes.
This tool is used to extract raw sequences (with qualities). We envision this tool being primarily useful to those wishing to duplicate or extend previous analyses.
BamTools provides a fast, flexible C++ API & toolkit for reading, writing, and manipulating BAM files.
bamUtil is a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, bam.
BCFTools (0.1.19) [Helix] [Biowulf]
BCFTools is part of the Samtools package now. See Samtools below.
For BCFTools / htslib bcftools commands, see vcftools below.
A suite of tools to address common questions raised in genomic studies mostly with regard to overlap and proximity relationships between data sets BEDOPS aims to be scalable and flexible, facilitating the efficient and accurate analysis and management of large-scale genomic data.
The BEDTools utilities allow one to address common genomics tasks such finding feature overlaps and computing coverage. In addition, one can develop sophisticated pipelines that answer complicated research questions by "streaming" several BEDTools together.
The Blat-like Fast Accurate Search Tool (BFAST) facilitates the fast and accurate mapping of short reads to reference sequences. Specifically, BFAST was designed to facilitate whole-genome resequencing, where mapping billions of short reads with variants is of utmost importance. BFAST supports both Illumina and ABI SOLiD data, as well as any other Next-Generation Sequencing Technology (454, Helicos).
Bioscope (1.3.1) [Biowulf]
SOLiD Bioscope provides a command line interface for running application-specific sequence analysis tools.
Bowtie is an ultrafast, memory-efficient short read aligner geared toward quickly aligning large sets of short DNA sequences (reads) to large genomes.
BWA is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome. BWA excels in its speed. Mapping 2 million high-quality 35bp short reads against the human genome can be done in 20 minutes.
Cis-regulatory Element Annotation System is a tool designed to characterize genome-wide protein-DNA interaction patterns from ChIP-chip and ChIP-Seq of both sharp and broad binding factors.
Creates genomic builds, calls SNPs, detects indels, and counts reads from data generated from one or more sequencing runs.
Cgatools provide tools for downstream analysis of Complete Genomics data. The focus is to provide command line utilities. The general areas of functionality include genome comparison, format conversion, and reference tools.
Circos is a program for the generation of publication-quality, circularly composited renditions of genomic data and related annotations. Circos is particularly suited for visualizing alignments, conservation and intra and inter-chromosomal relationships. Also, Circos is useful to visualize any type of information that benefits from a circular layout. Thus, although it has been designed for the field of genomics, it is sufficiently flexible to be used in other data domains.
CoNIFER uses exome sequencing data to find copy number variants (CNVs) and genotype the copy-number of duplicated genes.
Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.
cutadapt removes adapter sequences from high-throughput sequencing data. This is usually necessary when the read length of the sequencing machine is longer than the molecule that is sequenced, for example when sequencing microRNAs.
deFuse is a software package for gene fusion discovery using RNA-Seq data. The software uses clusters of discordant paired end alignments to inform a split read alignment analysis for finding fusion boundaries. The software also employs a number of heuristic filters in an attempt to reduce the number of false positives and produces a fully annotated output for each predicted fusion.
diffReps is a collection of (perl) programs and modules used for detection of differential chromatin modification sites from ChIP-seq data with biological replicates.
DELLY is an integrated structural variant prediction method that can detect deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout the genome.
a program for calling small indels from short-read sequence data. It is currently designed to handle only Illumina data.
eXpress is a streaming tool for quantifying the abundances of a set of target sequences from sampled subsequences. Example applications include transcript-level RNA-Seq quantification, allele-specific/haplotype expression analysis (from RNA-Seq), transcription factor binding quantification in ChIP-Seq, and analysis of metagenomic data.
FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines.
a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing
FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), and MNPs (multi-nucleotide polymorphisms) smaller than the length of a short-read sequencing alignment. Ogap Realigns alignments meeting specified criteria (number of gaps, mismatches) using Smith-Waterman parameters optimized to open gaps and eliminate mismatches and writes the stream of alignments as BAM on stdout.
an efficient fusion aligner which aligns reads spanning fusion junctions directly to the genome without prior knowledge of potential fusion regions.
A computational framework to identify fusion transcripts from paired-end RNA-Seq data.
Galaxy at Helix [Web]
Galaxy is an open, web-based platform for data intensive biomedical research. Many of the tools are run on the Biowulf cluster. A Helix login and password are required.
A structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy, and second it is a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas.
GLU is a framework and a software package that was designed to store, clean, and analyze data generated by whole-genome or candidate gene association scans.
GMAP: A Genomic Mapping and Alignment Program for mRNA and EST Sequences.
GSNAP: Genomic Short-read Nucleotide Alignment Program.
HiSeq (2.3.20-4) [Helix] [Biowulf]
Provides rapid and easy alignment and variant calling for whole human genomes or libraries prepared with the Nextera Rapid Capture Exome enrichment kit.
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and ChIP-Seq analysis.
HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.
IGV (2.3.20) and IGVTools (2.3.20) [Helix] [Biowulf]
The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated datasets.
LifeScope (2.5.1) [Helix] [Biowulf]
a modular data analysis bioinformatics tool for performing off-instrument secondary and tertiary analyses on sequence data generated by Life Technologies instruments
a software for finding and categorizing structural variation in genome sequencing data
Model-based Analysis of ChIP-Seq,MACS empirically models the length of the sequenced ChIP fragments from short Illumina/Solexa reads, and uses it to improve the spatial resolution of predicted protein binding sites on DNA.
Multiple alignment program for amino acid or nucleotide sequences
Maq is a software that builds mapping assemblies from short reads generated by the next-generation sequencing machines. It is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data.
MEGAN (4) [Helix]
metAMOS is an integrated assembly and analysis pipeline for metagenomic data.
MIRA is a Whole Genome Shotgun and EST Sequence Assembler. It is able to perform true hybrid de-novo assemblies and mapping assemblies of data from the 454 and Illumina/Solexa sequencing machines, either on their own or together with Sanger type sequencing data (hybrid mapping assemblies).
miRanda is an algorithm for finding genomic targets for microRNAs.
miRDeep2 is a completely overhauled tool which discovers microRNA genes by analyzing sequenced RNAs.
Probabilistic analysis and design of RNA-Seq experiments for identifying isoform regulation
MOSAIK is a reference-guided assembler that can work with FASTA,FASTQ,Illumina Bustard & Gerald, or SRF file formats and outputs phrap ace and GigaBayes gig formats.
MuTect is a method developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes.
MutSig (1.4) [Helix] [Biowulf]
MutSig analyzes lists of mutations discovered in DNA sequencing, to identify genes that were mutated more often than expected by chance given background mutation processes.
ngsplot (2.02) [Helix]
ngsplot is an easy-to-use global visualization tool for next-generation sequencing data.
The entire novocraft package(novoalign, novoalignMPI, novoalignCS, novoalignCSMPI, novomethyl, novobarcode etc) is available on Helix/Biowulf. Novoalign is an aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties whilst performing at the same or better speed than aligners that are limited to two mismatches and no insertions or deletions.
optiCall is designed to make accurate genotype calls across the minor allele frequency spectrum. Using intensity information from across multiple individuals and multiple SNPs when calling genotypes, allows it to call both rare and common variants accurately.
PartekGS (6.6-LINUX64-6.13.040) [Helix]
Rigorous and easy-to-use statistical tests for differential expression of genes or exons, and a flexible and powerful statistical test to detect alternative splicing based on a powerful mixed model analyis of variance.
Phred (071220) /Phrap (1.090518) /Consed (23.0) [Helix]
The phred software reads DNA sequencing trace files, calls bases, and assigns a quality value to each called base. phrap is a program for assembling shotgun DNA sequence data. Consed/Autofinish is a tool for viewing, editing, and finishing sequence assemblies created with phrap.
A set of tools (in Java) for working with next generation sequencing data in the BAM format.
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
Library for the analysis of genetic variation data
QuantiSNP (2) [Helix]
Ray is a paralleled computer-controlled software that computes de novo genome assemblies of next-gen sequencing data using message passing interface.
a java program which computes a series of quality control metrics for RNA-seq data.
The RSEG software package is aimed to analyze ChIP-Seq data, especially for identifying genomic regions and their boundaries marked by diffusive histone modification markers, such as H3K36me3 and H3K27me3.
comprehensively evaluate RNA-seq datasets generated from clinical tissues or other well annotated organisms such as mouse, fly and yeast.
A modular framework to analyze RNA-Seq data using compact and anonymized data summaries.
RUM (2.0.4) [Helix][Biowulf]
RUM is an alignment, junction calling, and feature quantification pipeline specifically designed for Illumina RNA-Seq data.
SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
Scripture is a method for transcriptome reconstruction that relies solely on RNA-Seq reads and an assembled genome to build a transcriptome ab initio. The statistical methods to estimate read coverage significance are also applicable to other sequencing data. Scripture also has modules for ChIP-Seq peak calling.
Scythe uses a Naive Bayesian approach to classify contaminant substrings in sequence reads. It considers quality information, which can make it robust in picking out 3-end adapters, which often include poor quality bases.
Sickle is a windowed adaptive trimming tool for FASTQ files using quality
identify single nucleotide positions that are different between tumor and normal cells.
program for the detection of Structural Variation events from whole genome sequenced read pair data
NCBI SRA toolkit.
SVA, is a computer software project designed to annotate, visualize, and analyze the genetic variants identified through next-generation sequencing studies, including whole-genome sequencing (WGS) and exome sequencing studies.
SHRiMP is a software package for aligning genomic reads against a target genome. It was primarily developed with the multitudinous short reads of next generation sequencing machines in mind, as well as Applied Biosystems colourspace genomic representation.
SOAP has been in evolution from a single alignment tool to a tool package that provides full solution to next generation sequencing data analysis. Currently, it consists of a new alignment tool (SOAPaligner/soap2), a re-sequencing consensus sequence builder (SOAPsnp), an indel finder ( SOAPindel ), a structural variation scanner ( SOAPsv ) and a de novo short reads assembler ( SOAPdenovo ). And a GPU-accelerated alignment tool (SOAP3/GPU) are being implemented.
SpliceMap is a de novo splice junction discovery and alignment tool. It offers high sensitivity and support for arbitrary RNA-seq read lengths.
SpliceTrap is a statistic tool for quantifying exon inclusion ratios in paired-end RNA-seq data, with broad applications for the study of alternative splicing.
SOLiD Software Suite provides software tools for data processing and analysis generated on SOLiD Analyzer. It supports multiple applications, is integrable with custom analysis pipelines and can complete primary (image acquisition and quality control) and secondary (alignment to a reference genome, base calling, and SNP identification) analysis of fragment and mate-paired experiments.
SSAHA2 (Sequence Search and Alignment by Hashing Algorithm) is a pairwise sequence alignment program designed for the efficient mapping of sequencing reads onto genomic reference sequences.
STAR aligns RNA-seq reads to a reference genome using uncompressed suffix arrays.
A fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
TophatFusion is an enhanced version of TopHat with the ability to align reads across fusion points, which results from the breakage and re-joining of two different chromosomes, or from rearrangements within a chromosome.
USeq is a collection of software tools for for both low and high level analysis of next generation, ultra high throughput signature sequencing data from the Solexa, SOLiD, and 454 platforms. Initial emphasis: chIP-seq and RNA-Seq with FDR estimations.
VarScan is for variant detection in massively parallel sequencing data.
VarSifter (1.6) [Helix]
A graphical Java program designed to display, sort, filter, and generally sift variation data from massively parallel sequencing experiments.
VCFtools contains a Perl API (Vcf.pm) and a number of Perl scripts that can be used to perform common tasks with VCF files such as file validation, file merging, intersecting, complements, etc.
A de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed at the EBI.
The Vienna RNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.
XHMM uses principal component analysis (PCA) normalization and a hidden Markov model (HMM) to detect and genotype copy number variation (CNV) from normalized read-depth data from targeted sequencing experiments.back to top
Collection of linkage and assosiation software
BEAGLE (3.3.2) [Helix]
For imputing genotypes, inferring haplotype phase, and performing genetic association analysis. BEAGLE is designed to analyze large-scale data sets with hundreds of thousands of markers genotyped on thousands of samples.
ChromoPainter, ChromoCombine, & FineStructure (0.0.4) [Biowulf]
ChromoPainter is a tool for finding haplotypes in sequence data. ChromoCombine is a tool to help manage the large number of \ files generated when running ChromoPainter in parallel on a large number of separate compute nodes. fineSTRUCTURE is a fast and powerful algori\ thm for identifying population structure using dense sequencing data.
The EIGENSOFT package combines functionality from our population genetics methods and our EIGENSTRAT stratification correction method.
These programs are statistical technique used to map genes and find the approximate location of disease genes.
Floss (1.4) [Helix]
FLOSS software package uses input and output files from the MERLIN linkage analysis package to perform an ordered subset analysis
GCTA (1.20) [Helix]
GCTA (Genome-wide Complex Trait Analysis) is designed to estimate the proportion of phenotypic variance explained by genome- or chromosome-wide SNPs for complex traits.
Generalized Disequilibrium Test (GDT) (0.1.1) [Helix]
GDT is a software package for family-based genome-wide association analysis
Genie (126.96.36.199) [Helix]
Genie is a general-purpose tool to analyze association and transmission disequilibrium (TDT) between genetic markers and traits in studies of families and independent individuals.
GERMLINE is a program for discovering long shared segments of Identity by Descent (IBD) between pairs of individuals in a large population.
IMPUTE is a program for estimating ("imputing") unobserved genotypes in SNP association studies.
Loki is a linkage analysis package, primarily for large and complex pedigrees, which uses Markov chain Monte Carlo (MCMC) techniques to avoid many of the computational problems that prevent exact computational methods being used for large pedigrees.
MACH 1.0 is a Markov Chain based haplotyper. It can be resolve long haplotypes or infer missing genotypes in samples of unrelated individuals.
Mach2qtl performs QTL analysis based on imputed dosages/posterior_probabilities.
Mega2 (4.5.4) [Helix]
A data-handling program for facilitating genetic linkage and association analyses
Mendel (11.0) [Helix][biowulf]
A comprehensive package for exact statistical genetic analysis of qualitative and quantitative traits.
MERLIN uses sparse trees to represent gene flow in pedigrees and is one of the fastest pedigree analysis packages around (Abecasis et al, 2002).
Metal (2011-03-25) [Helix]
The METAL software is designed to facilitate meta-analysis of large datasets (such as several whole genome scans) in a convenient, rapid and memory efficient manner.
PBAT: Tools for the statistical analysis of family-based association studies (FBAT).
Pedcheck (1.00) [Helix]
Program for detecting marker typing incompatibilities in pedigree data.
PennCNV: kilobase-resolution detection of copy number variations (CNVs) from Illumina high-density SNP genotyping data.
PLINK is whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
ProbABEL is a package for genome-wide association analysis of imputed data
that infers the relationships of pairs of individuals based on genetic marker data, either within families or across an entire sample.
A collection of compiled programs that perform a wide variety of genetic analyses.
SequenceLDhot (2006) [Helix]
Detecting Recombination Hotspots
SHAPEIT is a fast and accurate haplotype inference software.
Snplink (2005) [Helix]
Multipoint linkage analysis of densely distributed SNP data incorporating automated linkage disequilibrium removal.
SimWalk2 is a statistical genetics computer application for haplotype, parametric linkage, non-parametric linkage (NPL), identity by descent (IBD) and mistyping analyses on any size of pedigree.
SOLAR is a program for multipoint, oligogenic, variance component linkage analysis in pedigrees of arbitrary size and complexity (Almasy L; Blangero J, 1998).
UnPhased (3.1.6) [Helix]
A suite of programs for association analysis of multilocus haplotypes from unphased genotype data.
VEGAS is a program for performing gene-based tests for association using the results from genetic association studies. It annotates SNPs to corresponding genes, produces a gene-based test statistic, and then uses simulation to calculate an empirical gene-based p-value.
Vitesse (2) [Helix]
VITESSE is a software package that computes likelihoods with the functionality of the LINKMAP and MLINK programs from LINKAGE.back to top
BEAST (Bayesian Evolutionary Analysis Sampling Trees) is a cross-platform program for Bayesian MCMC analysis of molecular sequences.
The Madeline 2.0 Pedigree Drawing Engine is a pedigree drawing program designed to handle large and complex pedigrees with an emphasis on readability and aesthetics
MrBayes (3.2.0) [Biowulf]
MrBayes performs Bayesian estimation of phylogeny.
PAUP* (Phylogenetic Analysis Using Parsimony) is a software package for inference of evolutionary trees.
A package of programs for inferring phylogenies (evolutionary trees). Includes methods for parsimony, distance matrix and likelihood methods.
QIIME is an open source software package for comparison and analysis of microbial communities, primarily based on high-throughput amplicon sequencing data (such as SSU rRNA) generated on a variety of platforms, but also supporting analysis of other types of data (such as shotgun metagenomic data).
SLR is a program to detect sites in coding DNA that are unusually conserved and/or unusually variable (that is, evolving under purify or positive selection) by analysing the pattern of changes for an alignment of sequences on an evolutionary tree.back to top
Affymetrix Power Tools (APT) (1.15.1) [Helix]
APT are a set of cross-platform command line programs that implement algorithms for analyzing and working with Affymetrix GeneChipR arrays. APT programs are intended for "power users" who prefer programs that can be utilized in scripting environments and are sophisticated enough to handle the complexity of extra features and functionality.
An open source and open development software project for the analysis and comprehension of genomic data. It is an add-on to the R statistical analysis language and environment.back to top
FastMEDUSA (1.1) [Biowulf]
FastMEDUSA is a parallel program to infer gene regulatory networks from gene expression and promoter sequences.
NEURON is a simulation environment for modeling individual neurons and networks of neurons. It provides tools for conveniently building, managing, and using models in a way that is numerically sound and computationally efficient. It is particularly well-suited to problems that are closely linked to experimental data, especially those that involve cells with complex anatomical and biophysical properties.
PARADIGM (1.0) [Helix]
PARADIGM (PAthway Representation and Analysis by DIrect reference on Graphical Models) is a factor graph framework for pathway inference on high-throughput genomic data.back to top
AMBER (12) & AmberTools (12) [Biowulf]
AMBER is a package of molecular simulation programs.
APBS (1.4) [Biowulf]
APBS (Adaptive Poisson-Boltzmann Solver) is a software package for the numerical solution of the Poisson-Boltzmann equation (PBE), one of the most popular continuum models for describing electrostatic interactions between molecular solutes in salty, aqueous media.
CHARMm (c35b5) [Biowulf]
CHARMm is a general and flexible software application for modeling the structure and behavior of molecular systems.
GAMESS (01May13-R1) [Biowulf]
GAMESS is a general ab initio quantum chemistry package.
Gaussian (g09 D.01) [Biowulf]
Gaussian is a connected system of programs for performing semiempirical and ab initio molecular orbital (MO) calculations.
GROMACS (4.6.1) [Biowulf]
GROMACS is a versatile package to perform molecular dynamics.
LOOS (2.0.3b) [Helix]
LOOS is a code library for developing new analysis applications. It transparently reads the native file formats for most biomedical simulation packages, including CHARMM, NAMD, Gromacs, AMBER and Tinker.
MMTSB (Jul 2009) [Biowulf]
The Multiscale Modeling Tools for Structural Biology (MMTSB) Tool Set is a novel set of utilities and programming libraries that provide new enhanced sampling and multiscale modeling techniques for the simulation of proteins and nucleic acids. The tool set interfaces with the existing molecular modeling packages CHARMM and Amber for classical all-atom simulations, and with MONSSTER for lattice-based low-resolution conformational sampling.
NAMD (2.9) [Biowulf]
NAMD is a parallel molecular dynamics program for UNIX platforms designed for high-performance simulations in structural biology. VMD, the associated molecular visualization program, is also available on both Helix and Biowulf.
Q-Chem (188.8.131.52) [Biowulf]
NAMD Server [Web]
The NAMD Server at NIH is a web-based system supporting investigators at NIH who want to run NAMD ("Scalable molecular dynamics — high-performance simulations in structural biology") on the NIH Biowulf cluster. NIH login and password required.
OpenBabel (2.3.2) [Helix] [Biowulf]
Open Babel is a chemical toolbox designed to speak the many languages of chemical data.
TURBOMOLE (6.5) [Biowulf]
TURBOMOLE is a fast quantum chemical program package that is very stable and requires little memory and disk space. It consists of a series of modules and tools. Portions of the code are optimized for parallel use.back to top
Autodock (184.108.40.206) & AutodockVina (1_1_2) [Biowulf]
Autodock is a suite of automated docking tools. It is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure.
Biodesigner is a molecular modeling and visualization program for personal computers. The program is freely available for downloading. Biodesigner is capable of creating homologous models of proteins, evaluate, and refine the models.
CSRosetta (1.0) [Biowulf]
Chemical-Shift-ROSETTA is a robust protocol to use NMR chemical shifts for de novo protein structure generation by SPARTA-based selection of protein fragments from the PDB, in conjunction with a regular ROSETTA Monte Carlo assembly and relaxation method.
HADDOCK (2.1) [Biowulf]
HADDOCK (High Ambiguity Driven protein-protein DOCKing) is an approach for predicting protein-protein complex structures that makes use of biochemical and/or biophysical interaction data such as chemical shift perturbation data resulting from NMR titration experiments or mutagenesis data.
Jackal (2002) [Helix][Biowulf]
Jackal is a collection of programs designed for the modeling and analysis of protein structures. Its core program is a versatile homology modeling package nest.
MaxCluster (0.6.6) [Helix]
MaxCluster is a command-line tool for the comparison of protein structures. It provides a simple interface for a large number of common structure comparison tasks. A key feature of the program is the ability to process thousands of structures, either against a single reference protein or in an all-verses-all comparison.
PSIPRED (3.2) [Helix] [Biowulf]
Generates secondary structure predictions using up to four feed-forward neural networks and output from PSI-BLAST.
PyRosetta (r52071) [Helix] [Biowulf]
PyRosetta is an interactive Python-based interface to the powerful Rosetta molecular modeling suite. It enables users to design their own custom molecular modeling algorithms using Rosetta sampling methods and energy functions.
Rosetta (3.4) [Biowulf]
The Rosetta++ software suite can perform de novo protein structure predictions, identify low free energy sequences for target protein backbones, predict the structure of a protein-protein complex from the individual structures of the monomer components, incorporate NMR data into the basic Rosetta protocol to accelerate the process of NMR structure prediction, and more...
The core of RNAstructure is an implementation of the Zuker Algorithm to predict RNA secondary structures from sequence based on the principle of minimizing free energy. The thermodynamic data used for these predictions are the latest available from the Turner laboratory. Several modules are provided to extend the capabilities of the Zuker Algorithm and to make this a user-friendly RNA folding program.
Schrödinger [Helix] [Biowulf]
A limited number of Schrödinger applications are available on the Biowulf cluster through the Molecular Modeling Interest Group. Most are available through the Maestro GUI on Helix. The use of Schrödinger and any MMIGNET applications is limited to intramural NIH users only.
ZDOCK (3.0.2) [Biowulf]
ZDOCK predicts protein-docking models, and uses a fast Fourier transform to search all possible binding modes for proteins, evaluating based on shape complementarity, desolvation energy, and electrostatics.
A small collection of molecular modeling software is available on the helix systems through the auspices of the Molecular Modeling Interest Group. Type mmignet list to display the programs available. To run any of these programs, at the helix prompt you should type:
If multiple versions are available, they will be listed. Program parameters can be added at the end of the command line.back to top
CCP4 (220.127.116.11) [Helix]
CCP4 is a suite of programs for protein crystallography and structural biology.
CNS (1.3) [Biowulf]
Crystallography and NMR System (CNS) is a large system for computational structural biology.
Molecules To Go [Web]
Allows a search of the PDB database and displays the result as text, image or interactive structure.
NMRPipe (2011) [Helix]
NMRPipe is an extensive software system for processing, analyzing, and exploiting NMR spectroscopic data.
Phenix (1.8.4-1496) [Helix]
PHENIX is a new software suite for the automated determination of macromolecular structures using X-ray crystallography and other methods.
ProFit (3.1) [Helix] [Biowulf]
ProFit is designed to be the ultimate protein least squares fitting program. It has many features including flexible specification of fitting zones and atoms, calculation of RMS over different zones or atoms, RMS-by-residue calculation, on-line help facility, etc.
Calculates surfaces, volumes, B-factor plots, hydrogen bonds, secondary structure and more from a PDB coordinate ID or uploaded PDB file.
XPLOR-NIH (2.34) [Biowulf]
XPLOR-NIH is a structure determination program which builds on the X-PLOR v3.851 program, including additional tools developed at the NIH.back to top
Comsol (4.3a) [Helix]
COMSOL Multiphysics (previously named FEMLAB) is a modeling package for the simulation of any physical process you can describe with partial differential equations (PDEs). It features state-of-the-art solvers that address complex problems quickly and accurately, while its intuitive structure is designed to provide ease of use and flexibility.
The GAUSS Mathematical and Statistical System is an easy-to-use data analysis environment based on the fast and powerful GAUSS Matrix Programming Language designed for computationally intensive tasks. GAUSS has over 400 functions built in, including LINPACK, EISPACK, and BLAS routines. In addition you can add your own functions to this library.
The GNU Scientific Library (GSL) is a collection of C routines for numerical analysis. The library provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. There are over 1000 functions with a test suite.
IDL is software for data analysis, visualization, and cross-platform application development. IDL combines all of the tools you need for any type of project, from "quick-look," interactive analysis and display to large-scale commercial programming projects. IDL can be used to rapidly develop algorithms, interfaces, and powerful visualizations and quickly crunch through large numerical problems.
IVEware (0.2) [Helix]
IVEware is imputation and variance estimation software that can be called from SAS or independently. It uses a multivariate sequential regression approach for obtaining the imputed values.
An interactive system for doing mathematical computation. It performs numerical, symbolic and graphical computations, and incorporates a high-level programming language. Graphics may be viewed on interactive terminals, X11 window servers and PostScript printers. See the file /usr/local/doc/Mathematica.txt for an introduction to the use of Mathematica on the NIH Helix Systems. The Mathematica man pages can be perused by typing man math, man mathremote, and man psfix.
A high-performance interactive software package for scientific and engineering numeric computation. MATLAB integrates numerical analysis, matrix computation, signal processing, and graphics in an environment where problems and solutions are expressed just as they are written mathematically.
Pari/GP (2.5.5) [HELIX]
PARI/GP is a widely used computer algebra system designed for fast computations in number theory (factorizations, algebraic number theory, elliptic curves...).
Octave (3.6.1) [Biowulf]
GNU Octave is an open-source language for numerical calculations that has a command-line interface and can interpret many (but not all) Matlab scripts. It is not license-limited and so can be used for many simultaneous independent runs.
OpenBUGS (3.2.2) [Biowulf]
OpenBUGS is a software package for performing Bayesian inference Using Gibbs Sampling.
R (the R Project) is a language and environment for statistical computing and graphics. R is similar to S, and provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, ...).
ROOT (5.26) [Helix]
The ROOT system provides a set of Object-Oriented frameworks with all the functionality needed to handle and analyse large amounts of data in a very efficient way. Having the data defined as a set of objects, specialised storage methods are used to get direct access to the separate attributes of the selected objects, without having to touch the bulk of the data. Included are histograming methods in 1, 2 and 3 dimensions, curve fitting, function evaluation, minimisation, graphics and visualization classes to allow the easy setup of an analysis system that can query and process the data interactively or in batch mode.
Base SAS provides a scalable, integrated software environment specially designed for data access, transformation and reporting.
Scilab is an open-source alternative to Matlab which includes hundreds of mathematical functions and the ability to interactively add C/Fortran programs. It includes a Matlab->Scilab converter.
S-PLUS (8.0) [Helix]
An object-oriented language for data analysis, with many functions for statistical, numerical and graphical techniques. Users can extend the language by designing new functions using the S language. Graphics may be displayed on interactive terminals or X-Window serversback to top
InsPecT (2012.01.09) [Biowulf]
InsPecT is a MS/MS database search tool specifically designed to address two crucial needs of the proteomics community: post-translational modification identification and search speed.
Mascot (2.4) [Web]
The Mascot search engine uses mass spectrometry data to identify proteins from primary sequence databases. Mascot searches can be run directly from http://biospec.nih.gov, or by using the Mascot daemon on your own desktop PC.
OMSSA (2.1.9) [Biowulf]
An efficient search engine for identifying MS/MS peptide spectra by searching libraries of known protein sequences. OMSSA scores significant hits with a probability score developed using classical hypothesis testing, the same statistical method used in BLAST.back to top
Bayesian analysis of blinking and bleaching, or 3B microscopy, is a method which analyses data in which many overlapping fluorophores undergo bleaching and blinking events, giving the structure at enhanced resolution.
AFNI (Analysis of Functional NeuroImages) is a set of C programs for processing, analyzing, and displaying functional MRI (FMRI) data - a technique for mapping human brain activity.
Analyze is a comprehensive, robust and powerful software package for 3D biomedical image visualization and analysis. Requires an X-Windows capable desktop machine. To use it, type analyze at the prompt.
Bsoft (1.8.6) [Biowulf]
Bsoft is a collection of programs and a platform for development of software for image and molecular processing in structural biology. Problems in structural biology are approached with a highly modular design, allowing fast development of new algorithms without the burden of issues such as file I/O. It provides an easily accessible interface, a resource that can be and has been used in other packages.
EMAN (1.9) [Biowulf]
EMAN is a suite of scientific image processing tools aimed primarily at the transmission electron microscopy community, though it is beginning to be used in other fields as well.
EMAN2 (2.07) [Biowulf]
EMAN2 is the successor to EMAN1. It is a broadly based greyscale scientific image processing suite with a primary focus on processing data from transmission electron microscopes.
FreeSurfer is a set of automated tools for reconstruction of the brain's cortical surface from structural MRI data, and overlay of functional MRI data onto the reconstructed surface.
FSL (5.0) [Biowulf]
FSL is a comprehensive library of image analysis and statistical tools for FMRI, MRI and DTI brain imaging data.
Situs (2.6.2) [Biowulf]
Huygens (4.4.0-p8) [Helix]
Image restoration, deconvolution, resolution and noise reduction. It can process images from all current optical microscopes, including wide-field, confocal, Nipkow (scanning disk confocal), multiple-photon, and 4Pi microscopes.
IDL/ENVI (8.2/5.0) [Helix]
IDL is a complete computing environment for the interactive analysis and visualization of data. IDL integrates an array-oriented language with mathematical analysis and graphical display techniques. ENVI is designed for extracting information from geospatial and medical imagery.
ImageJ is a public domain Java image processing program inspired by NIH Image for the Macintosh. It runs, either as an online applet or as a downloadable application, on any computer with a Java 1.1 or later virtual machine.
Imaris provides scientists with solutions for processing, visualizing and analyzing multi-dimensional microscopic images. It reads images in many of the most commonly used proprietary formats.
Image processing toolbox.
MEDx is a UNIX based software package used to visualize and analyze 2D and 3D medical image data.
The MIPAV (Medical Image Processing, Analysis, and Visualization) application enables quantitative analysis and visualization of medical images of numerous modalities (i.e. PET, MRI, CT, microscopy...). Using MIPAV's standard user-interface and analysis tools, researchers at remote sites (via the internet) can easily share research data and analyses, thereby enhancing their ability to research, diagnose, monitor, and treat medical disorders.
MRIcro allows Windows and Linux computers to view medical images. It is a standalone program, but includes tools to complement SPM (software that allows neuroimagers to analyse MRI, fMRI and PET images). MRIcro allows efficient viewing and exporting of brain images. In addition, it allows neuropsychologists to identify regions of interest (ROIs, e.g. lesions). MRIcro can create Analyze format headers for exporting brain images to other platforms.
OpenDX is a uniquely powerful, full-featured software package for the visualization of scientific, engineering and analytical data: Its open system design is built on a standard interface environments.
OsiriX is an image processing software dedicated to DICOM images (".dcm" / ".DCM" extension) produced by medical equipment (MRI, CT, PET, PET-CT, ...) and confocal microscopy (LSM and BioRAD-PIC format).
ParaView uses the Visualization Toolkit (VTK), a software system for 3D computer graphics, image processing, and visualization, as its data processing and rendering engine. It is designed to visualize data sets of size varying from small to very large.
Slicer, or 3D Slicer, is a free, open source software package for visualization and image analysis.
TORTOISE (Tolerably Obsessive Registration and Tensor Optimization Indolent Software Ensemble) is for processing diffusion MRI data.back to top
UCSF Chimera is a highly extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles.
Coot (0.7.2) [Helix]
Coot is for macromolecular model building, model completion and validation, particularly suitable for protein modelling using X-ray data.
GaussView (5) [Helix]
GRASP2 is an updated version of the GRASP program used for macromolecular structure and surface visualization. It is written for the Windows operating system and contains a large number of new features and scientific tools.
iMol allows loading molecules using several file formats: PDB, XYZ, MOL2, HIN, CAR, ALC, BIO. The molecules can be saved as PDB, XYZ or BIO files. The BIO file stores all view and rendering settings (e.g. colors, lighting, orientation of molecules). iMol can easily handle both small and large molecules, it can load multiple molecules, move and rotate them independently.
Jmol is a molecular viewer for three-dimensional chemical structures. Features include reading a variety of file types and output from quantum chemistry programs, and animation of multi-frame files and computed normal modes from quantum programs.
Molscript (2.1.2) [Helix]
MolScript is a program for displaying molecular 3D structures, such as proteins, in both schematic and detailed representations. To use it, type 'molscript' at the prompt.
Molauto (1.1) [Helix] [Biowulf]
MolAuto is a program for producing good first-approximation MolScript input files (scripts) from a coordinate file. To use it, type molauto at the helix prompt.
POVRAY (Persistence of Vision RAYtracer) is a high-quality tool for creating three-dimensional graphics. Raytraced images are publication-quality and 'photo-realistic', but are computationally expensive so that large images can take many hours to create. PovRay images can also require more memory than many desktop machines can handle. To address these concerns, a parallelized version of PovRay has been installed on the Biowulf system.
PROCHECK (3.5) [Helix & Biowulf]
PROCHECK checks the stereochemical quality of a protein structure, producing a number of PostScript plots analysing its overall and residue-by-residue geometry. It includes PROCHECK-NMR for checking the quality of structures solved by NMR.
PyMOL is a comprehensive molecular visualization product for rendering and animating 3D molecular structures.
Rasmol (2.7.5) [Helix] [Biowulf]
Rasmol is a program for molecular graphics visualization. To use, type rasmol at the Helix or Biowulf prompt.
VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting. To use, type vmd at the prompt.back to top
- Java (1.7.0_25)
- Perl (5.12.1)
- PHP (5.1.6)
- Python (2.7.3)
- Ruby (1.9.3-p125)
- awk, gawk, tcl/tk, and other typical Unix tools
Compilers [Helix] [Biowulf]
- gcc (type man gcc for details)
- Fortran 77 (type man g77 for details)
- Fortran 95 (type man gfortran for details)
- Portland Group compilers
[PGI Compiler documentation] To initialize, type
source /usr/local/pgi/pgivars.sh (bash shell)
source /usr/local/pgi/pgivars.csh (csh or tcsh shells).
- pgcc (C)
- pgCC (C++)
- Pathscale compilers
[Documentation] To initialize, type
source /usr/local/pathscale/pathvars.sh (bash shell)
source /usr/local/pathscale/pathvars.csh (csh or tcsh shell).
- pathcc (C)
- pathCC (C++)
- Intel compilers
[Intel documentation site To initialize, type
source /usr/local/intel/intelvars.sh (bash shell)
source /usr/local/intel/intelvars.csh (csh or tcsh shell).
Haskell is an advanced purely-functional programming language.
IMSL (6.0) [Helix]
A widely used library of mathematical and
statistical routines in Fortran. To use the IMSL libraries,
SCSL (Scientific Computing Software Library) [Helix] [Biowulf]
A collection of high-performance routines that provide support for mathematical and numerical techniques used in scientific and technical computing. Included in SCSL are:
- Blas (Basic Linear Algebra Subprograms) - a collection of some 400 subroutines
- LAPACK (Linear Algebra Package) - a portable library of subroutines for solving the most common dense linear algebra problems.
- Signal Processing - consists of routines that perform mixed-radix fast Fourier transforms (FFTs) as well as linear filtering operations such as convolution and correlation
- Sparse direct and iterative solvers
- Type 'man intro_scsl' on Helix for more information
Message-Passing Interface (MPI) [Biowulf]
MPI is a library specification for message-passing, designed for high performance on both massively parallel machines and on workstation clusters.back to top
Batch systems are used to run programs at a later time or date, or to set up a large number of program runs.
Modules are a convenient and effective way to set up environments for applications.
Editors are used to create and modify text files, such as vi and SciTE.
Web browsers and ftp are examples of tools that are used to communicate between computers.
Miscellaneous executables and tools for displaying, evaluating, manipulating, and dealing with scientific data.
A list of the scripting languages that are available.
A small selection of database management systems are available.
Tools for managing changes to documents, programs, and other information.
Acroread [Helix] [Biowulf]
Acroread is the standard tool for reading Adobe PDF files. Type acroread at the prompt.
Bluefish (2.2.3) [Helix]
Bluefish is a powerful editor targeted towards programmers and webdesigners, with many options to write websites, scripts and programming code.
Eye of GNOME (eog) [Helix]
Eye of GNOME is the official image viewer for the GNOME Desktop environment. It displays images, no frills, no surprises. Type eog at the prompt.
Ghostscript [Helix] [Biowulf]
Ghostscript is an X-Windows viewer for Postscript files. To use, type ghostscript [postscript filename] at the prompt.
Gimp [Helix] [Biowulf] [Sciware]
Gimp is the the GNU Image Manipulation Program. It is a freely distributed piece of software suitable for such tasks as photo retouching, image composition and image authoring. It will display on a desktop machine running X-windows. To use, type gimp at the prompt.
ImageMagick [Helix] [Biowulf]
ImageMagick is a collection of tools and libraries to read, write and manipulate images in over 89 formats. To use, type the command line tool name (e.g., display, or convert) at the prompt.
Tex / Latex (2013) [Helix] [Biowulf]
TeX and Latex are two parts of a high-quality typesetting system; it includes features designed for the production of technical and scientific documentation. LaTeX is the de facto standard for the communication and publication of scientific documents. Also available are dvips, dvipdf etc. Type 'module load tex' to set up the environment.
LibreOffice, the historical evolution of OpenOffice, is the leading open-source office software suite for word processing, spreadsheets, presentations, graphics, databases and more. To use, type soffice [filename] at the prompt.
Asymptote (2.24) [Helix]
Asymptote is a powerful descriptive vector graphics language that provides a natural coordinate-based framework for techinical drawings.
tmux (1.8) [Helix] [Biowulf]
tmux is a terminal multiplexer which lets you switch easily between several programs in one terminal, detach them and reattach them to a different terminal. To use, type module load tmux, then xterm -e tmux& or tmux at the prompt.back to top
Nucleotide & Protein Sequence Databases [Helix/Biowulf]
A large collection of major nucleotide and protein databases are maintained in several formats on the Helix Systems. A comprehensive list of all databases on our systems is available, displaying the database name, format(s), and update status.
GeneTorrent (3.8.3) [Helix]
GeneTorrent is a set of executables for accessing data in the Cancer Genomics Hub.
Genome Browser Mirror Fragments [Web]
A partial mirror containing the reference sequences for the human, mouse, rat, and other vertebrate (and one insect) genomes, as well as tools for display and analysis.
This site displays expressed sequence tag (EST) cDNA clones from eye tissues (derived from NEIBank and other sources) aligned with current versions of the human, rhesus, mouse, rat, dog, cow, chicken, or zebrafish genomes, including reference sequences for known genes. This gives a simplified view of gene expression activity from different parts of the eye across the genome.
Protein Data Bank [Helix/Biowulf] [NFS-mountable]
The Helix Systems maintains a mirror of the Protein Data Bank entries, which can be accessed in the /pdb area on Helix, the Biowulf head node, or the Biowulf computational nodes. It is updated weekly.
The PDB mirror can also be NFS-mounted by any Unix system at NIH. To access this mirror via NFS, system administrators should add the following entry to /etc/fstab on your Unix desktop system:
helixdb.nih.gov:/pdb /pdb nfs ro
Type mkdir /pdb, and then mount -a to complete the mount. Contact firstname.lastname@example.org with any questions.
The Cambridge Structural Database System (CSDS) consists of two major components:
The Cambridge Structural Database (CSD)
Bibliographic, 2D chemical and 3D structural results from crystallographic analyses of organics, organometallics and metal complexes. Both X-ray and neutron diffraction studies are included for compounds containing up to ca. 500 atoms (including hydrogens).
Software for Search, Retrieval, Analysis and Display of CSD contents
Comprised of the programs QUEST/QUEST3D (for text, numeric, 2D substructure and 3D geometric searching), VISTA (for 3D analysis of QUEST3D searches), GSTAT (for 3D search and data analysis) and PLUTO (graphical display of 3D structures).