Choosing Word Occurrences for the Smallest Grammar Problem

Rafael Carrascosa 1 François Coste 2, * Matthias Gallé 2 G. Infante-Lopez 1
* Auteur correspondant
2 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : The smallest grammar problem - namely, finding a smallest context-free grammar that generates exactly one sequence - is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose to focus on the choice of the occurrences to be rewritten by non-terminals. We extend classical offline algorithms by introducing a global optimization of this choice at each step of the algorithm. This approach allows us to define the search space of a smallest grammar by separating the choice of the non-terminals and the choice of their occurrences. We propose a second algorithm that performs a broader exploration by allowing the removal of useless words that were chosen previously. Experiments on a classical benchmark show that our algorithms consistently find smaller grammars then state-of-the-art algorithms.
Liste complète des métadonnées
Contributeur : François Coste <>
Soumis le : mardi 27 avril 2010 - 13:52:25
Dernière modification le : jeudi 11 janvier 2018 - 06:20:10


  • HAL Id : inria-00476840, version 1



Rafael Carrascosa, François Coste, Matthias Gallé, G. Infante-Lopez. Choosing Word Occurrences for the Smallest Grammar Problem. LATA, May 2010, Trier, Germany. 2010. 〈inria-00476840〉



Consultations de la notice