Learning Automata on Protein Sequences

François Coste 1 Goulven Kerbellec 1
1 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Pattern discovery is limited to position-specific characterizations like Prosite's patterns or profile-HMMs which are unable to handle, for instance, dependencies between amino acids distant in the sequence of a protein, but close in its three-dimensional structure. To overcome these limitations, we propose to learn automata on proteins. Inspired by grammatical inference and multiple alignment techniques, we introduce a sequence-driven approach based on the idea of merging ordered partial local multiple alignments (PLMA) under preservation or consistency constraints and on an identification of informative positions with respect to physico-chemical properties . The quality of the characterization is asserted experimentally on two difficult sets of proteins by a comparison with (semi)-manually designed patterns of Prosite and with state-of-the-art pattern discovery algorithms. Further leave-one-out experimentations show that learning more precise automata allows to gain in accuracy by increasing the classification margins.
Type de document :
Communication dans un congrès
Alain Denise and Pascal Durrens and Stéphane Robin and Eduardo Rocha and Antoine de Daruvar and Alexis Groppi. JOBIM, Jul 2006, Bordeaux, France. pp.199--210, 2006
Liste complète des métadonnées

Littérature citée [35 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00180429
Contributeur : François Coste <>
Soumis le : vendredi 19 octobre 2007 - 10:25:08
Dernière modification le : mercredi 16 mai 2018 - 11:23:05
Document(s) archivé(s) le : dimanche 11 avril 2010 - 23:18:24

Fichiers

coste_kerbellec_jobim06.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00180429, version 1

Citation

François Coste, Goulven Kerbellec. Learning Automata on Protein Sequences. Alain Denise and Pascal Durrens and Stéphane Robin and Eduardo Rocha and Antoine de Daruvar and Alexis Groppi. JOBIM, Jul 2006, Bordeaux, France. pp.199--210, 2006. 〈inria-00180429〉

Partager

Métriques

Consultations de la notice

353

Téléchargements de fichiers

243