Learning canonical Potts models

Pablo Espana Gutierrez

Master Thesis Year : 2023

Learning canonical Potts models

(1)

Pablo Espana Gutierrez

Function : Author
PersonId : 1334509

Dynamics, Logics and Inference for biological Systems and Sequences

Abstract

The number of registered sequences of proteins grows much faster than the number of experimentally annotated ones. A standard approach to mitigate this ever increasing gap is to transfer annotations from a known sequence to other proteins of common ancestry called homologs, which may have preserved similar structure and function. Doing so requires to identify homologous sequences of a given query sequence, which is referred to as homology search. Many methods have been proposed to retrieve homologs, from dynamic programming alignments to more complex methods such as specific kinds of Hidden Markov models that are today’s state-of-the-art for such task. However none of these approaches consider distant pairwise dependencies of positions in the protein’s sequence. In parallel, recent works in another field called contact prediction have shown promising results to identify 3D-contact points of a folded protein. Such impulse was given thanks to global statistical models including Potts models, a specific kind of Markov Random Fields that represents both positional information and dependencies between pairs of positions in a sequence. Because of these properties, Potts models have been adopted by Talibart during his thesis [20], hoping to identify remote homologs better than current techniques by aligning these global models together. However, such task unveiled new challenges. Because of their intrinsic overparametrization, and because of sampling biases that could not be easily handled, the inferred models to be aligned are not comparable. Both issues were studied during this internship hoping to make the whole alignment workflow fully operational, first by searching for a relevant canonical form of Potts models to get rid of unwanted parameters divergence, and then by working on the model inference itself by exploring how explicit covariance-based methods that are able to overcome these sampling issues could be adapted to directly infer comparable models.

Domains

Computer Science [cs]

Embargoed file

1	―	7	―	23
Year		Month		Jours

Avant la publication
Sunday, January 11, 2026

François Coste : Connect in order to contact the contributor

https://inria.hal.science/hal-04388795

Submitted on : Thursday, January 11, 2024-4:35:01 PM

Last modification on : Thursday, January 18, 2024-2:55:43 PM

Dates and versions

hal-04388795 , version 1 (11-01-2024)

Licence

Attribution

Identifiers

HAL Id : hal-04388795 , version 1

Cite

Pablo Espana Gutierrez. Learning canonical Potts models. Computer Science [cs]. 2023. ⟨hal-04388795⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INSA-RENNES IRISA CENTRALESUPELEC INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

6 View

3 Download

Learning canonical Potts models

Abstract

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Share