inria-00180429, version 1
Learning Automata on Protein Sequences
François Coste
a, 1Goulven Kerbellec
a, 1
JOBIM (2006) 199--210
Abstract: Pattern discovery is limited to position-specific characterizations like Prosite's patterns or profile-HMMs which are unable to handle, for instance, dependencies between amino acids distant in the sequence of a protein, but close in its three-dimensional structure. To overcome these limitations, we propose to learn automata on proteins. Inspired by grammatical inference and multiple alignment techniques, we introduce a sequence-driven approach based on the idea of merging ordered partial local multiple alignments (PLMA) under preservation or consistency constraints and on an identification of informative positions with respect to physico-chemical properties . The quality of the characterization is asserted experimentally on two difficult sets of proteins by a comparison with (semi)-manually designed patterns of Prosite and with state-of-the-art pattern discovery algorithms. Further leave-one-out experimentations show that learning more precise automata allows to gain in accuracy by increasing the classification margins.
- a – INRIA
- 1: SYMBIOSE (INRIA - IRISA)
- CNRS : UMR6074 – INRIA – INSA Rennes – Université de Rennes 1
- Domain : Computer Science/Learning
Life Sciences/Quantitative Methods - Keywords : Grammatical Inference – Automata – Proteins – Pattern Discovery
- inria-00180429, version 1
- http://hal.inria.fr/inria-00180429
- oai:hal.inria.fr:inria-00180429
- From: François Coste
- Submitted on: Friday, 19 October 2007 10:25:08
- Updated on: Friday, 19 October 2007 11:10:40






Associated documents

Export