Local Substitutability for Sequence Generalization - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

Local Substitutability for Sequence Generalization

Résumé

Genomic banks are fed continuously by large sets of DNA or RNA sequences coming from high throughput machines. Protein annotation is a task of first importance with respect to these banks. It consists of retrieving the genes that code for proteins within the sequences and then predict the function of these new proteins in the cell by comparison with known families. Many methods have been designed to characterize protein families and discover new members, mainly based on subsets of regular expressions or simple Hidden Markov Models. We are interested in more expressive models that are able to capture the long -range characteristic interactions occurring in the spatial structure of the analyzed protein family. Starting from the work of Clark and Eyraud (2007) and Yoshinaka (2008) on inference of substitutable and k, l-substitutable languages respectively, we introduce new classes of substitutable languages using local rather than global substitutability, a reasonable assumption with respect to protein structures to enhance inductive leaps performed by least generalized generalization approaches. The concepts are illustrated on a first experiment using a real proteic sequence set.
Fichier non déposé

Dates et versions

hal-00730553 , version 1 (10-09-2012)

Identifiants

  • HAL Id : hal-00730553 , version 1

Citer

François Coste, Gaelle Garet, Jacques Nicolas. Local Substitutability for Sequence Generalization. ICGI 2012, University of Maryland, Sep 2012, Washington, United States. pp.97-111. ⟨hal-00730553⟩
259 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More