PAML is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. It is developed and maintained by Ziheng Yang at University College London. (PAML website)
PAML is intended to be used interactively on Helix. To run a PAML job, log on to helix.nih.gov using ssh. The PAML executables can be easily added to your path usingmodule load pamlwhich will load the latest version of PAML. If you want a specific version, use module avail paml to see what is available, and then module load paml/version to load a particular version.
PAML programs are:
- baseml and codeml
- The program baseml is for maximum likelihood analysis of nucleotide sequences. The program codeml is formed by merging two old programs: codonml, which implements the codon substitution model of Goldman and Yang (1994) for protein-coding DNA sequences, and aaml, which implements models for amino acid sequences. These two are now distinguished by the variable seqtype in the control file codeml.ctl, with 1 for codon sequences and 2 for amino acid sequences. In this document I use codonml and aaml to mean codeml with seqtype = 1 and 2, respectively. The programs baseml, codonml, and aaml use similar algorithms to fit models by maximum likelihood, the main difference being that the unit of evolution in the Markov model, referred to as a "site" in the sequence, is a nucleotide, a codon, or an amino acid for the three programs, respectively. Markov process models are used to describe substitutions between nucleotides, codons or amino acids, with substitution rates assumed to be either constant or variable among sites.
- This program can be used to simulate sequences under nucleotide, codon and amino acid substitution models. It also has some other options such as generating random trees, and calculating the partition distances (Robinson and Foulds 1981) between trees.
- This program implements the (continuous) gamma model of Yang (1993). It is very slow and unfeasible for data of more than 6 or 7 species. Instead the discrete-gamma model in baseml should be used. mcmctree. This implements the Bayesian MCMC algorithm of Yang and Rannala (2006) and Rannala and Yang (2007) for estimating species divergence times.
- This implements the parsimony-based analysis of Yang and Kumar (1996).
- This implements the method of Yang and Nielsen (2000) for estimating synonymous and nonsynonymous substitution rates (dS and dN) in pairwise comparisons of protein-coding DNA sequences.
- This is for conducting likelihood ratio tests. It calculates the chi square critical values, which you can compare with your test statistic calculated from the real data to determine whether the test is significant at the 5% or 1% levels. Run the program by typing the program name ?chi2?. The program can also calculate the P value when you input the test statistic and the d.f. Run the program by typing ?chi2 p?.
PAML is not good for tree making. There are a few options for heuristic tree search, but they do not work well except for small data sets of only a few species. If you hope to use PAML to compare trees from relatively large data sets, one possibility is to get a collection of candidate trees and then compare them using more sophisticated models implemented in PAML. You can get candidate trees by using other programs/methods implemented in PAUP*, PHYLIP, MOLPHY etc.
PAML may be useful if you are interested in the process of sequence evolution. The two main programs, baseml and codeml, implement a number of sophisticated models, which you can use to construt likelihood ratio tests of evolutionary hypotheses. Right now, the following options/models do not seem available in other packages.
Sample sessionA set of example input files for PAML is available in /usr/local/apps/paml/paml4.7/examples. Feel free to copy them to your own directory for testing.
% module load paml/4.7 % mkdir /data/$USER/pamlfiles/ % cd /data/$USER/pamlfiles % cp /usr/local/apps/paml/paml4.7/examples/MouseLemurs/* . % baseml BASEML in paml version 4.7, January 2013 ns = 35 ls = 1812 Reading sequences, sequential format.. Reading seq #35: M.rufus2 Sequences read.. Counting site patterns.. 0:00 1023 patterns at 1812 / 1812 sites (100.0%), 0:00 Counting frequencies.. 7140 bytes for distance 1113024 bytes for conP 40920 bytes for fhK 8000000 bytes for space 3 branch types are in tree. Stop if wrong. TREE # 1: (((((((((((27, (28, (35, (30, (34, (32, 29)))))), (31, 33)), 26), 25), 19), 20), (((22, 23), 24), 21)), 18), (16, 17)), (15, (14, (13, (12, (11, 10)))))), ((9, (7, 8)), ((((4, 3), 5), 6), (2, 1)))); MP score: -1.00 [... etc ...] 64 h-m-p 1.6000 8.0000 0.0000 C 25976.769529 0 2.0656 3503 Out... lnL = -25976.769529 Calculating SE's Time used: 0:48
PAML User Guide (PDF)
PAML FAQ (PDF)