Learning Automata on Protein Sequences

François Coste 1 Goulven Kerbellec 1
1 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Pattern discovery is limited to position-specific characterizations like Prosite's patterns or profile-HMMs which are unable to handle, for instance, dependencies between amino acids distant in the sequence of a protein, but close in its three-dimensional structure. To overcome these limitations, we propose to learn automata on proteins. Inspired by grammatical inference and multiple alignment techniques, we introduce a sequence-driven approach based on the idea of merging ordered partial local multiple alignments (PLMA) under preservation or consistency constraints and on an identification of informative positions with respect to physico-chemical properties . The quality of the characterization is asserted experimentally on two difficult sets of proteins by a comparison with (semi)-manually designed patterns of Prosite and with state-of-the-art pattern discovery algorithms. Further leave-one-out experimentations show that learning more precise automata allows to gain in accuracy by increasing the classification margins.
Document type :
Conference papers
Alain Denise and Pascal Durrens and Stéphane Robin and Eduardo Rocha and Antoine de Daruvar and Alexis Groppi. JOBIM, Jul 2006, Bordeaux, France. pp.199--210, 2006
Liste complète des métadonnées

Cited literature [35 references]  Display  Hide  Download

https://hal.inria.fr/inria-00180429
Contributor : François Coste <>
Submitted on : Friday, October 19, 2007 - 10:25:08 AM
Last modification on : Wednesday, November 29, 2017 - 4:23:49 PM
Document(s) archivé(s) le : Sunday, April 11, 2010 - 11:18:24 PM

Files

coste_kerbellec_jobim06.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00180429, version 1

Collections

Citation

François Coste, Goulven Kerbellec. Learning Automata on Protein Sequences. Alain Denise and Pascal Durrens and Stéphane Robin and Eduardo Rocha and Antoine de Daruvar and Alexis Groppi. JOBIM, Jul 2006, Bordeaux, France. pp.199--210, 2006. 〈inria-00180429〉

Share

Metrics

Record views

295

Files downloads

201