Learning Automata on Protein Sequences - Inria - Institut national de recherche en sciences et technologies du numérique Access content directly
Conference Papers Year : 2006

Learning Automata on Protein Sequences

Abstract

Pattern discovery is limited to position-specific characterizations like Prosite's patterns or profile-HMMs which are unable to handle, for instance, dependencies between amino acids distant in the sequence of a protein, but close in its three-dimensional structure. To overcome these limitations, we propose to learn automata on proteins. Inspired by grammatical inference and multiple alignment techniques, we introduce a sequence-driven approach based on the idea of merging ordered partial local multiple alignments (PLMA) under preservation or consistency constraints and on an identification of informative positions with respect to physico-chemical properties . The quality of the characterization is asserted experimentally on two difficult sets of proteins by a comparison with (semi)-manually designed patterns of Prosite and with state-of-the-art pattern discovery algorithms. Further leave-one-out experimentations show that learning more precise automata allows to gain in accuracy by increasing the classification margins.
Fichier principal
Vignette du fichier
coste_kerbellec_jobim06.pdf (403.58 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

inria-00180429 , version 1 (19-10-2007)

Identifiers

  • HAL Id : inria-00180429 , version 1

Cite

François Coste, Goulven Kerbellec. Learning Automata on Protein Sequences. JOBIM, Jul 2006, Bordeaux, France. pp.199--210. ⟨inria-00180429⟩
184 View
259 Download

Share

Gmail Facebook X LinkedIn More