inria-00536633, version 1
Searching for Smallest Grammars on Large Sequences and Application to DNA
Rafael Carrascosa a, 1François Coste
b, 2, 3Matthias Gallé
c, 2, 3Gabriel Infante-Lopez 1
Journal of Discrete Algorithms 11 (2012) 62-72
Résumé : Motivated by the inference of the structure of genomic sequences, we address here the smallest grammar problem. In previous work, we introduced a new perspective on this problem, splitting the task into two different optimization problems: choosing which words will be considered constituents of the final grammar and finding a minimal parsing with these constituents. Here we focus on making these ideas applicable on large sequences. First, we improve the complexity of existing algorithms by using the concept of maximal repeats when choosing which substrings will be the constituents of the grammar. Then, we improve the size of the grammars by cautiously adding a minimal parsing optimization step. Together, these approaches enable us to propose new practical algorithms that return smaller grammars (up to 10\%) in approximately the same amount of time than their competitors on a classical set of genomic sequences and on whole genomes of model organisms.
- a – Universidad Nacional de Córdoba
- b – INRIA
- c – Université de Rennes I
- 1 : Grupo de Procesamiento de Lenguaje Natural (PLN - FaMAF)
- Universidad Nacional de Córdoba
- 2 : SYMBIOSE (INRIA - IRISA)
- CNRS : UMR6074 – INRIA – Institut National des Sciences Appliquées (INSA) - Rennes – Université de Rennes 1
- 3 : DYLISS (INRIA - IRISA)
- INRIA – CNRS : UMR6074 – Université de Rennes 1
- Collaboration : MINCyT- INRIA
- Domaine : Sciences du Vivant/Bio-Informatique, Biologie Systémique
Informatique/Bio-informatique
Informatique/Apprentissage
Informatique/Théorie de l'information et codage
Mathématiques/Théorie de l'information et codage
- inria-00536633, version 1
- http://hal.inria.fr/inria-00536633
- oai:hal.inria.fr:inria-00536633
- Contributeur : François Coste
- Soumis le : Mardi 9 Octobre 2012, 14:34:48
- Dernière modification le : Mardi 4 Décembre 2012, 15:44:47






Documents associés
Exporter