Local Substitutability for Sequence Generalization

François Coste 1 Gaelle Garet 1 Jacques Nicolas 2
1 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
2 Dyliss - Dynamics, Logics and Inference for biological Systems and Sequences
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : Genomic banks are fed continuously by large sets of DNA or RNA sequences coming from high throughput machines. Protein annotation is a task of first importance with respect to these banks. It consists of retrieving the genes that code for proteins within the sequences and then predict the function of these new proteins in the cell by comparison with known families. Many methods have been designed to characterize protein families and discover new members, mainly based on subsets of regular expressions or simple Hidden Markov Models. We are interested in more expressive models that are able to capture the long -range characteristic interactions occurring in the spatial structure of the analyzed protein family. Starting from the work of Clark and Eyraud (2007) and Yoshinaka (2008) on inference of substitutable and k, l-substitutable languages respectively, we introduce new classes of substitutable languages using local rather than global substitutability, a reasonable assumption with respect to protein structures to enhance inductive leaps performed by least generalized generalization approaches. The concepts are illustrated on a first experiment using a real proteic sequence set.
Type de document :
Communication dans un congrès
Jeffrey Heinz and Colin de la Higuera and Tim Oates. ICGI 2012, Sep 2012, Washington, United States. MIT Press, 21, pp.97-111, 2012, JMLR Workshop and Conference Proceedings. 〈http://jmlr.csail.mit.edu/proceedings/papers/v21/coste12a/coste12a.pdf〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00730553
Contributeur : Jacques Nicolas <>
Soumis le : lundi 10 septembre 2012 - 15:59:55
Dernière modification le : mercredi 16 mai 2018 - 11:23:35

Identifiants

  • HAL Id : hal-00730553, version 1

Citation

François Coste, Gaelle Garet, Jacques Nicolas. Local Substitutability for Sequence Generalization. Jeffrey Heinz and Colin de la Higuera and Tim Oates. ICGI 2012, Sep 2012, Washington, United States. MIT Press, 21, pp.97-111, 2012, JMLR Workshop and Conference Proceedings. 〈http://jmlr.csail.mit.edu/proceedings/papers/v21/coste12a/coste12a.pdf〉. 〈hal-00730553〉

Partager

Métriques

Consultations de la notice

499