Estimating empirical codon hidden Markov models

Nicola De Maio, Ian Holmes, Christian Schlötterer, Carolin Kosiol

Research output: Contribution to journalArticlepeer-review

19 Citations (Scopus)
1 Downloads (Pure)


Empirical codon models (ECMs) estimated from a large number of globular protein families outperformed mechanistic codon models in their description of the general process of protein evolution. Among other factors, ECMs implicitly model the influence of amino acid properties and multiple nucleotide substitutions (MNS). However, the estimation of ECMs requires large quantities of data, and until recently, only few suitable data sets were available. Here, we take advantage of several new Drosophila species genomes to estimate codon models from genome-wide data. The availability of large numbers of genomes over varying phylogenetic depths in the Drosophila genus allows us to explore various divergence levels. In consequence, we can use these data to determine the appropriate level of divergence for the estimation of ECMs, avoiding overestimation of MNS rates caused by saturation. To account for variation in evolutionary rates along the genome, we develop new empirical codon hidden Markov models (ecHMMs). These models significantly outperform previous ones with respect to maximum likelihood values, suggesting that they provide a better fit to the evolutionary process. Using ECMs and ecHMMs derived from genome-wide data sets, we devise new likelihood ratio tests (LRTs) of positive selection. We found classical LRTs very sensitive to the presence of MNSs, showing high false-positive rates, especially with small phylogenies. The new LRTs are more conservative than the classical ones, having acceptable false-positive rates and reduced power.

Original languageEnglish
Pages (from-to)725-36
Number of pages12
JournalMolecular Biology and Evolution
Issue number3
Early online date27 Nov 2012
Publication statusPublished - Mar 2013


  • Empirical cordon model
  • Rate heterogeneity
  • Hidden Markov models
  • Positive selection
  • Drosophilia substitution patterns


Dive into the research topics of 'Estimating empirical codon hidden Markov models'. Together they form a unique fingerprint.

Cite this