Learning canonical Potts models - Inria - Institut national de recherche en sciences et technologies du numérique Access content directly
Master Thesis Year : 2023

Learning canonical Potts models

Abstract

The number of registered sequences of proteins grows much faster than the number of experimentally annotated ones. A standard approach to mitigate this ever increasing gap is to transfer annotations from a known sequence to other proteins of common ancestry called homologs, which may have preserved similar structure and function. Doing so requires to identify homologous sequences of a given query sequence, which is referred to as homology search. Many methods have been proposed to retrieve homologs, from dynamic programming alignments to more complex methods such as specific kinds of Hidden Markov models that are today’s state-of-the-art for such task. However none of these approaches consider distant pairwise dependencies of positions in the protein’s sequence. In parallel, recent works in another field called contact prediction have shown promising results to identify 3D-contact points of a folded protein. Such impulse was given thanks to global statistical models including Potts models, a specific kind of Markov Random Fields that represents both positional information and dependencies between pairs of positions in a sequence. Because of these properties, Potts models have been adopted by Talibart during his thesis [20], hoping to identify remote homologs better than current techniques by aligning these global models together. However, such task unveiled new challenges. Because of their intrinsic overparametrization, and because of sampling biases that could not be easily handled, the inferred models to be aligned are not comparable. Both issues were studied during this internship hoping to make the whole alignment workflow fully operational, first by searching for a relevant canonical form of Potts models to get rid of unwanted parameters divergence, and then by working on the model inference itself by exploring how explicit covariance-based methods that are able to overcome these sampling issues could be adapted to directly infer comparable models.
Embargoed file
Embargoed file
1 7 23
Year Month Jours
Avant la publication
Sunday, January 11, 2026
Embargoed file
Sunday, January 11, 2026
Please log in to request access to the document

Dates and versions

hal-04388795 , version 1 (11-01-2024)

Licence

Attribution

Identifiers

  • HAL Id : hal-04388795 , version 1

Cite

Pablo Espana Gutierrez. Learning canonical Potts models. Computer Science [cs]. 2023. ⟨hal-04388795⟩
6 View
3 Download

Share

Gmail Facebook X LinkedIn More