A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences

Abstract : Most models of nucleotide or amino acid substitution used in phylogenetic studies assume that the evolutionary process has been homogeneous across lineages and that composition of nucleotides or amino acids has remained the same throughout the tree. These oversimplified assumptions are refuted by the observation that compositional variability characterizes extant biological sequences. Branch-heterogeneous models of protein evolution that account for compositional variability have been developed, but are not yet in common use because of the large number of parameters required, leading to high computational costs and potential overparameterization. Here, we present a new branch-nonhomogeneous and nonstationary model of protein evolution that captures more accurately the high complexity of sequence evolution. This model, henceforth called Correspondence and likelihood analysis (COaLA), makes use of a correspondence analysis to reduce the number of parameters to be optimized through maximum likelihood, focusing on most of the compositional variation observed in the data. The model was thoroughly tested on both simulated and biological data sets to show its high performance in terms of data fitting and CPU time. COaLA efficiently estimates ancestral amino acid frequencies and sequences, making it relevant for studies aiming at reconstructing and resurrecting ancestral amino acid sequences. Finally, we applied COaLA on a concatenate of universal amino acid sequences to confirm previous results obtained with a nonhomogeneous Bayesian model regarding the early pattern of adaptation to optimal growth temperature, supporting the mesophilic nature of the Last Universal Common Ancestor.
Complete list of metadatas

Cited literature [78 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02320419
Contributor : Manolo Gouy <>
Submitted on : Friday, October 18, 2019 - 4:51:27 PM
Last modification on : Wednesday, October 23, 2019 - 4:13:05 PM

File

Groussin-Syst Biol-2013.pdf
Publisher files allowed on an open archive

Identifiers

Collections

Citation

M. Groussin, Bastien Boussau, Manolo Gouy. A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences. Systematic Biology, Oxford University Press (OUP), 2013, 62 (4), pp.523-538. ⟨10.1093/sysbio/syt016⟩. ⟨hal-02320419⟩

Share

Metrics

Record views

93

Files downloads

21