Structural Biology on the Biowulf Cluster
David Hoover, firstname.lastname@example.org
Helix Systems, CIT/NIH
December 5, 2007
Biowulf cluster -- details
Dependent vs. independent parallel processing
There are two computational situations which are very well suited to a large cluster.
Dependent parallel processing: large, monolithic processes which can be broken into smaller interdependent processes:
These are typically solved using an application that is already parallelized. The application typically takes an input file and perhaps some options, maybe environment variables required (source files?).
Independent parallel processing: short processes that can be run independently, with the results combined later on (also termed embarrassingly parallel):
These sometimes take work from the user to set up. This can be done typically with shell scripts. Some complex situations call for perl or python scripts, or maybe C/C++/Fortran programs for the ambitious.
home page http://amber.scripps.edu/ version 9.0 type molecular dynamics ease-of-use * documentation http://biowulf.nih.gov/apps/amber.html parallelized? yes myrinet? yes scaling 8-16 cpu
AMBER is a package of molecular simulation programs. It is also a set of molecular mechanical force fields for the simulation of biomolecules. AMBER was initially created by Peter Kolhman; the package is currently a joint development of at least six institutions
There are about 50 programs in version 8. The main programs are segregated into three categories:
- Preparatory programs: LEaP, antechamber
- Simulation programs: sander, pmemd, nmode
- Analysis programs: ptraj, and mm_pbsa
Unless you are a hard core theoretical chemist, you probably want to go through the AMBER tutorials before doing anything specialized.
AMBER is compiled to run as a MPI-parallel program across multiple nodes. It can also utilize Myrinet interconnects to improve efficiency. It can scale to about 8 or 16 processors, depending on the processor and interconnect type as well as the program executed.
home page http://www.charmm.org version 27-34, others type molecular dynamics ease-of-use * documentation http://biowulf.nih.gov/apps/charmm/index.html parallelized? yes myrinet? yes scaling 16 cpu
CHARMM (Chemistry at HARvard Macromolecular Mechanics) is a command-line program for performing molecular dynamic simulations of biomolecules. CHARMM was initially created by Martin Karplus; like AMBER, CHARMM is currently a joint development of at least a dozen institutions, including NIH. CHARMM is actively developed here at NIH by Rick Venable and Bernard Brooks.
CHARMM can be run as regular ethernet interconnect parallel or Myrinet interconnect, depending on the version used.
Two command scripts, qcharmm and mpicharmm, are available for simplifying submission of CHARMM jobs to Biowulf. An input script (.inp) with a series of commands and variables is required to run.
Here is a decent introduction to MD using CHARMM.
home page http://www.gromacs.org version 3.3.1,3.2.1 type molecular dynamics ease-of-use *** documentation http://biowulf.nih.gov/apps/gromacs/index.html parallelized? yes myrinet? yes scaling 10-20 cpu
GROMACS (GROningen MAchine for Chemical Simulations) is touted as the World's Fasted Molecular Dynamics, and it is definitely more user-friendly than CHARMM or AMBER. It was designed and developed primarily by Herman Berendsen's group at Groningen University, although there is some collaboration with other institutions.
Coarse grain simulations are also possible, speeding up simulations by ~1000 fold.
home page http://www.ks.uiuc.edu/Research/namd version 2.6 type molecular dynamics ease-of-use *** documentation http://biowulf.nih.gov/apps/namd/index.html parallelized? yes myrinet? no scaling 4-32+ cpu
NAMD (Not Another Molecular Dynamics program) is a molecular dynamics simulation program that was designed specifically for Beowulf-class clusters (like Biowulf). It was developed by the Theoretical Biophysics Group at the Beckman Institute (University of Illinois).
NAMD, like GROMACS, primarily performs molecular dynamics. It scales quite well with ordinary ethernet interconnects, but not very well with Myrinet interconnects.
NAMD takes easily obtained PSF, PDB, and parameter files from CHARMM and X-PLOR as input, and is submitted via qsub with simple commands.
Uses spatial-decomposition strategies for parallelism; CHARMM and AMBER use atom-decomposition (replicated data) for parallelism.
VMD was written specifically for NAMD, so the output is very easily visualized.
home page http://apbs.sourceforge.net version 0.5.0 type electrostatics ease-of-use ** documentation http://biowulf.nih.gov/apps/apbs.html parallelized? no myrinet? no scaling n/a
APBS (Adaptive Poisson-Boltzmann Solver) is a software package for the numerical solution of the Poisson-Boltzmann equation (PBE).
APBS is run in batch mode, and its output can be visualized using VMD. It is similar to GRASP, but is more complex and powerful.
home page http://www.msg.ameslab.gov/GAMESS/ version Mar. 2007 type quantum chemistry ease-of-use *** documentation http://biowulf.nih.gov/apps/gamess.html parallelized? yes myrinet? no scaling 8 cpu?
GAMESS (the General Atomic and Molecular Electronic Structure System) is a general ab initio quantum chemistry package. GAMESS is maintained by the members of the Gordon research group at Iowa State University.
home page http://www.gaussian.com/g03.htm version D02 type quantum chemistry ease-of-use *** documentation http://biowulf.nih.gov/apps/gaussian/ parallelized? no myrinet? no scaling n/a
Gaussian03 is the latest in the Gaussian series of electronic structure programs. Designed to model a broad range of molecular systems under a variety of conditions, it performs its computations starting from the basic laws of quantum mechanics.
home page http://www.q-chem.com/ version 2.1 type quantum chemistry ease-of-use ** documentation http://biowulf.nih.gov/apps/q-chem.html parallelized? yes myrinet? no scaling ?
Q-Chem is an ab initio electronic structure program capable of performing first principles calculations on both the ground and excited states of molecules.
home page http://www.nmr.chem.uu.nl/haddock/ version 2.0 type structure prediction ease-of-use * documentation http://biowulf.nih.gov/apps/haddock_biowulf.html parallelized? yes myrinet? no scaling ?
HADDOCK (High Ambiguity Driven protein-protein DOCKing) is an approach for predicting protein-protein complex structures that makes use of biochemical and/or biophysical interaction data such as chemical shift perturbation data resulting from NMR titration experiments or mutagenesis data.
home page http://cns.csb.yale.edu/v1.1/ version 1.1 type structure determination and refinement ease-of-use * documentation http://helix.nih.gov/Applications/cns.html, http://biowulf.nih.gov/apps/xplor-nih.html parallelized? yes and no myrinet? no scaling ?
Crystallography and NMR System (CNS) is a flexible multi-level package for macromolecular structure determination.
Xplor-NIH is a structure determination program which builds on the X-PLOR program, including additional tools for NMR analysis. The advantage of running Xplor-NIH on Biowulf would be to spawn a large number of independent refinement jobs which would run on multiple Biowulf nodes.
home page http://www.gv.cnrs-gif.fr/english/vs2-english.html version n/a type structure determination and refinement ease-of-use ** documentation http://biowulf.nih.gov/apps/amore/index.html parallelized? no myrinet? no scaling n/a
AMoRe is an automated utility for performing molecular replacement using fast rotation and translation functions in a step-wise fashion.
home page http://www.povray.org/ version 3.1,3.6 type visualization ease-of-use ** documentation http://biowulf.nih.gov/apps/povray/index.html parallelized? yes and no myrinet? no scaling n/a
POVRAY (Persistence of Vision RAYtracer) is a high-quality tool for creating three-dimensional graphics. Raytraced images are publication-quality and 'photo-realistic', but are computationally expensive so that large images can take many hours to create. PovRay images can also require more memory than many desktop machines can handle. To address these concerns, a parallelized version of PovRay (povray_swarm) has been installed on the Biowulf system.
home page http://www.ks.uiuc.edu/Research/vmd/current/docs.html version 1.8.6 type visualization ease-of-use **** documentation http://helix.nih.gov/Applications/vmd.html parallelized? no myrinet? no scaling n/a
VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting. It has powerful and comprehensive filtering and configuration capabilities. It is especially well-suited for analyzing NAMD results.
home page http://www.umass.edu/microbio/rasmol/ version 22.214.171.124 type visualization ease-of-use *** documentation http://www.openrasmol.org/ parallelized? no myrinet? no scaling n/a
RasMol is a molecular graphics program intended for the visualisation of proteins, nucleic acids and small molecules. The program is aimed at display, teaching and generation of publication quality images. RasMol runs on wide range of architectures and operating systems including Microsoft Windows, Apple Macintosh, UNIX and VMS systems. UNIX and VMS versions require an 8, 24 or 32 bit colour X Windows display (X11R4 or later). The X Windows version of RasMol provides optional support for a hardware dials box and accelerated shared memory communication (via the XInput and MIT-SHM extensions) if available on the current X Server.
home page http://www.rosettacommons.org/ version 2.2 type protein structure prediction and modeling ease-of-use * documentation http://biowulf.nih.gov/apps/Rosetta.html parallelized? no myrinet? no scaling n/a
The Rosetta++ software suite focuses on the prediction and design of protein structures, protein folding mechanisms, and protein-protein interactions. The Rosetta codes have been repeatedly successful in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competition as well as the CAPRI competition and have been modified to address additional aspects of protein design, docking and structure.
home page http://zlab.bu.edu/zdock/index.shtml version 2.3 type protein modeling ease-of-use *** documentation http://biowulf.nih.gov/apps/zdock.html parallelized? yes myrinet? no scaling up to 32 cpu
ZDOCK uses a fast Fourier transform to search all possible binding modes for the proteins, evaluating based on shape complementarity, desolvation energy, and electrostatics.
home page http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:nest version n/a type homology modeling ease-of-use * documentation http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:nest parallelized? no myrinet? no scaling n/a
nest is a program for modeling protein structure based on a given sequence-template alignment. It has the following capabilities:
- model building with artificial evolution
- sequence alignment tuning
- composite structure building
- model building based on multiple templates
- structure refinement
nest can be used to build homology models based on:
- a single sequence-template alignment
- from multiple templates for the entire structure
- from different templates used for different regions of the structure
It also carries out energy based structure refinement and can change an alignment based on energetic considerations.
nest, and the entire Jackal suite from Jason Xiang, is also available through mmignet.
home page http://autodock.scripps.edu/ version 3.0.5/td> type protein modeling ease-of-use ** documentation http://biowulf.nih.gov/apps/autodock.html parallelized? no myrinet? no scaling n/a
Autodock is a suite of automated docking tools. It is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure. Autodock was developed at the Scripps Research Institute in San Diego.
home page http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html version 3.5 type structure analysis ease-of-use ** documentation http://biowulf.nih.gov/apps/procheck/ parallelized? no myrinet? no scaling n/a
Checks the stereochemical quality of a protein structure, producing a number of PostScript plots analysing its overall and residue-by-residue geometry.
home page http://swift.cmbi.ru.nl/gv/dssp/ version Nov. 2002, CMBI version type structure analysis ease-of-use *** documentation n/a parallelized? no myrinet? no scaling n/a
The DSSP program was designed by Wolfgang Kabsch and Chris Sander to standardize secondary structure assignment. DSSP is a database of secondary structure assignments (and much more) for all protein entries in the Protein Data Bank (PDB). DSSP is also the program that calculates DSSP entries from PDB entries.
Benchmarks for MD (AMBER, CHARMM, and NAMD --http://brooks.scripps.edu/charmm_docs/Benchmarks/chm_amb_namd.html)
Running the applications
All applications run on Biowulf must submitted through qsub. For short tests, an interactive session can be started with the -I flag, but all long runs (greater than 30 minutes) should be submitted to the regular batch queue.
A script containing commands is created:
#!/bin/bash myprog < /data/me/mydata
This is submitted with qsub:
qsub -l nodes=1 myjob.sh
Minimally, the number of nodes must be supplied with the -l nodes=1 options. More precise properties required can be added:
qsub -l nodes=1:o2200:myr2k:m2048 myjob.sh
faste fast ethernet (100 Mb/s) interconnect gige gigabit ethernet (1 Gb/s) interconnect myr2k Myrinet (2 Gb/s) interconnect ib Infiniband (10 Gb/s) interconnect m1024 1 GB memory m2048
2 GB memory
m4096 4 GB memory p2800 2.8 GHz Intel Xeon o2000 2.0 GHz AMD Opteron 246 o2200 2.2 GHz AMD Opteron 248 o2600 2.6 GHz AMD Opteron 285, dual-core (4 CPU) o2800 2.8 GHz AMD Opteron 254 altix SGI Altix 350 (see Firebolt page for more information) x86-64
o2000 + o2200 nodes + o2800 nodes
dual-core (o2600) nodes
2.8 GHz dual-core (o2800) nodes running CentOS 4.2
-N name Declare a name for the job -m mail_options Send mail to user upon 'a' (abort), 'b' (begin), 'e' (end) -k keep e = standard error, o = standard output -s path_list Declares the shell that interprets the job -v variable_list Export named environment variables to hosts running job -V Export all environment variables
All options (except for -l nodes=... option) can be placed within the qsub script:
#!/bin/bash #PBS -N MyJob #PBS -m be #PBS -k oe #PBS -V #PBS -s /bin/sh myprog < /data/me/mydata
PBS-specific environment variables:
$PBS_O_HOST name of the host upon which the qsub command is running $PBS_O_QUEUE
name of the original queue to which the job was submitted
$PBS_O_SYSTEM operating system name given by uname -s on $PBS_O_HOST $PBS_O_WORKDIR absolute path of the directory from which the qsub command was given $PBS_ENVIRONMENT either PBS_BATCH or PBS_INTERACTIVE $PBS_JOBID job identifier assigned to the job by the batch system $PBS_JOBNAME job name supplied by the user $PBS_NODEFILE pathname of the file containing the list of nodes assigned to the job $PBS_QUEUE name of the queue from which the job is executed
Monitoring and deleting qsub jobs
Monitor through the web: http://biowulf.nih.gov/sysmon/
[biowulf]$ freen m1024 m2048 m4096 m8192 Total ----------------- GeneralPool ----------------- o2800 / / 0/210 / 0/210 o2200 / 22/232 0/58 / 22/290 o2000 / 17/40 / / 17/40 p2800 2/79 91/195 0/62 / 93/336 ----------------- Myrinet ----------------- o2200 / 34/71 / / 34/71 o2000 38/47 / / / 38/47 p2800 37/38 / / / 37/38 ----------------- Infiniband ----------------- o2800 / / 14/93 / 14/93 ----------------- Reserved ----------------- o2800 / 46/89 / 27/34 73/123 o2600 / / 46/274 / 46/274 ------------------- Altix -------------------- Available: 15 processors, 15.2 GB memory
qstat -u displays simple list of jobs for a single user:
[biowulf]$ qstat -u me
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
999999.biobos me norm MyJob 8713 1 1 -- -- R 99:99
qstat -f jobid displays a detailed report of a single job:
[biowulf]$ qstat -f 999999.biobos
Job Id: 999999.biobos
Job_Name = MyJob
Job_Owner = me@p1397
resources_used.cpupercent = 98
resources_used.cput = 23:26:24
resources_used.mem = 9452kb
resources_used.ncpus = 1
resources_used.vmem = 152328kb
resources_used.walltime = 23:26:56
job_state = R
queue = norm
server = biobos
Checkpoint = u
ctime = Wed Jan 4 14:25:35 2006
Error_Path = p1397:/home/me/MyJob.e999999
exec_host = p295/0
Hold_Types = n
Join_Path = oe
Keep_Files = n
Mail_Points = ae
mtime = Wed Jan 4 14:27:00 2006
Output_Path = p1397:/home/me/MyJob.o999999
Priority = 0
qtime = Wed Jan 4 14:25:35 2006
Rerunable = True
Resource_List.ncpus = 1
Resource_List.neednodes = 1:faste
Resource_List.nodect = 1
Resource_List.nodes = 1:faste
session_id = 8713
Variable_List = PBS_O_HOME=/home/me,PBS_O_LANG=en_US,
comment = Job run at Wed Jan 04 at 14:26
etime = Wed Jan 4 14:25:35 2006
qdel jobid kills the job
[biowulf]$ qdel 999999.biobos
Sometimes you have to push:
[biowulf]$ qdel -W force 999999.biobos
Delete all your jobs:
[biowulf]$ qdel -W force `qselect -u me`
The swarm command
A large set of independent processes can be submitted automatically to the cluster without having to create a qsub script for each process.
cd /home/me/a; myprog -param a < infile-a > outfile-a
cd /home/me/b; myprog -param b < infile-b > outfile-b
cd /home/me/c; myprog -param c < infile-c > outfile-c
cd /home/me/d; myprog -param d < infile-d > outfile-d
cd /home/me/e; myprog -param e < infile-e > outfile-e
cd /home/me/f; myprog -param f < infile-f > outfile-f
cd /home/me/g; myprog -param g < infile-g > outfile-g
Submit the swarm job:
[biowulf]$ swarm -f MyJob.swarm -V -l nodes=1:x86-64 720749.biobos 720750.biobos 720751.biobos 720752.biobos
Bundled swarm jobs:
If there are thousands and thousands of processes within a single swarm file (which each last a miniscule amount of time), it is better to serially run a block of individual processes on a single host, rather than spawn a new batch job for each process. This makes PBS much happier and is a much more efficient use of time:
[biowulf]$ swarm -b 100 -f MyJob.swarm -V -l nodes=1:x86-64 720805.biobos 720806.biobos
Deleting a single set of swarm jobs:
It is tricky to delete a single set of swarm jobs in the midst of other jobs in the batch queue. It is simplified with by using the command swarmdel:
Type the swarmdel command using one of the jobids as the argument:
[biowulf]$ swarmdel 720751.biobos 720749 'swarm1n29087' deleted 720750 'swarm2n29087' deleted 720751 'swarm3n29087' deleted 720752 'swarm4n29087' deleted
MPI and multirun
MPI is a library specification for message-passing, proposed as a standard by a broadly based committee of vendors, implementors, and users. A program is compiled using MPICH (TCP/IP) or MPICH-GM (Myrinet GM), and the program is run using the command mpirun:
mpirun -nolocal -machinefile $PBS_NODEFILE -np 8 MyProg
Here is a typical batch command file to run an MPI-compiled program (AMBER):
#!/bin/csh #PBS -N sander #PBS -m be #PBS -k oe set path = (/usr/local/mpich-pg/bin $path ) set file=/data/me/amber/dinuc_test cd /data/me/amber/nomyri date mpirun -machinefile $PBS_NODEFILE -np $np /usr/local/amber/exe.mpich-pg/sander \ -i $file.in -o $file.out -p $file.top -c $file.coor -x $file.crd -e $file.en \ -inf $file.info -r $file.rst
This script can be submitted with the qsub command
[biowulf]$ qsub -v np=8 -l nodes=4:o2200 amber.run
The multirun command
Similar to swarm, but more controlled (and oftentimes less efficient), creating a single job with unified STDOUT and STDERR output files.
1. Create an executable shell script which will run multiple instances of your program (run6.sh):
MyProg < args0
MyProg < args1
MyProg < args2
MyProg < args3
MyProg < args4
MyProg < args5
2. Use mpirun in your batch command file (MyJob.sh) to run the mpi shell program (run6.sh):
#PBS -N MyJob
#PBS -m be
#PBS -k oe
set path=(/usr/local/mpich/bin $path)
mpirun -machinefile $PBS_NODEFILE -np 6 \
/usr/local/bin/multirun -m /home/me/run6.sh
3. Submit the job to the batch system:
[biowulf]$ qsub -l nodes=3 MyJob.sh
Large-scale structural biology
The term "large-scale" here refers to repetively executing a series of programs on a large number of individual inputs (protein structures, nucleotide sequences, data sets, etc.).
Practical tips for parallelizing jobs using scripts
Important commands and tools:
Managing I/O, memory, and disk space requirements
This document is available as http://helix.nih.gov/talks/strbio.html
Biowulf home page | Helix Systems | NIH
Last modified: 05 Dec 2007