sign in
english version rss feed

inria-00180429, version 1

Learning Automata on Protein Sequences

François Coste () a1, Goulven Kerbellec () a1

JOBIM (2006) 199--210

Abstract: Pattern discovery is limited to position-specific characterizations like Prosite's patterns or profile-HMMs which are unable to handle, for instance, dependencies between amino acids distant in the sequence of a protein, but close in its three-dimensional structure. To overcome these limitations, we propose to learn automata on proteins. Inspired by grammatical inference and multiple alignment techniques, we introduce a sequence-driven approach based on the idea of merging ordered partial local multiple alignments (PLMA) under preservation or consistency constraints and on an identification of informative positions with respect to physico-chemical properties . The quality of the characterization is asserted experimentally on two difficult sets of proteins by a comparison with (semi)-manually designed patterns of Prosite and with state-of-the-art pattern discovery algorithms. Further leave-one-out experimentations show that learning more precise automata allows to gain in accuracy by increasing the classification margins.

  • a –  INRIA
  • 1:  SYMBIOSE (INRIA - IRISA)
  • CNRS : UMR6074 – INRIA – INSA Rennes – Université de Rennes 1
  • Domain : Computer Science/Learning
    Life Sciences/Quantitative Methods
  • Keywords : Grammatical Inference – Automata – Proteins – Pattern Discovery
 
  • inria-00180429, version 1
  • oai:hal.inria.fr:inria-00180429
  • From: 
  • Submitted on: Friday, 19 October 2007 10:25:08
  • Updated on: Friday, 19 October 2007 11:10:40
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...
all articles on CCSd database...