Skip to Main content Skip to Navigation
Journal articles

Back-translation for discovering distant protein homologies in the presence of frameshift mutations

Marta Gîrdea 1, 2, * Laurent Noé 2, 1, * Gregory Kucherov 2, 3, 1, *
* Corresponding author
1 SEQUOIA2 - Algorithms for large scale sequence analysis
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe
Abstract : Background
Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins' common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level.
Results
We developed a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. Our implementation is freely available at http://bioinfo.lifl.fr/path/.
Conclusions
Our approach allows to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.
Complete list of metadatas

Cited literature [40 references]  Display  Hide  Download

https://hal.inria.fr/hal-00784444
Contributor : Ed. Bmc <>
Submitted on : Monday, February 4, 2013 - 1:10:45 PM
Last modification on : Wednesday, December 9, 2020 - 6:02:06 PM
Long-term archiving on: : Saturday, April 1, 2017 - 3:43:54 PM

Files

1748-7188-5-6.pdf
Publisher files allowed on an open archive

Identifiers

Citation

Marta Gîrdea, Laurent Noé, Gregory Kucherov. Back-translation for discovering distant protein homologies in the presence of frameshift mutations. Algorithms for Molecular Biology, BioMed Central, 2010, 5 (1), pp.6. ⟨10.1186/1748-7188-5-6⟩. ⟨hal-00784444⟩

Share

Metrics

Record views

451

Files downloads

460