Skip to Main content Skip to Navigation
Conference papers

Learning Automata on Protein Sequences

François Coste 1 Goulven Kerbellec 1 
1 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Pattern discovery is limited to position-specific characterizations like Prosite's patterns or profile-HMMs which are unable to handle, for instance, dependencies between amino acids distant in the sequence of a protein, but close in its three-dimensional structure. To overcome these limitations, we propose to learn automata on proteins. Inspired by grammatical inference and multiple alignment techniques, we introduce a sequence-driven approach based on the idea of merging ordered partial local multiple alignments (PLMA) under preservation or consistency constraints and on an identification of informative positions with respect to physico-chemical properties . The quality of the characterization is asserted experimentally on two difficult sets of proteins by a comparison with (semi)-manually designed patterns of Prosite and with state-of-the-art pattern discovery algorithms. Further leave-one-out experimentations show that learning more precise automata allows to gain in accuracy by increasing the classification margins.
Complete list of metadata

Cited literature [35 references]  Display  Hide  Download
Contributor : François Coste Connect in order to contact the contributor
Submitted on : Friday, October 19, 2007 - 10:25:08 AM
Last modification on : Friday, February 4, 2022 - 3:15:11 AM
Long-term archiving on: : Sunday, April 11, 2010 - 11:18:24 PM


Files produced by the author(s)


  • HAL Id : inria-00180429, version 1


François Coste, Goulven Kerbellec. Learning Automata on Protein Sequences. JOBIM, Jul 2006, Bordeaux, France. pp.199--210. ⟨inria-00180429⟩



Record views


Files downloads