Choosing Word Occurrences for the Smallest Grammar Problem

Rafael Carrascosa 1 François Coste 2, * Matthias Gallé 2 G. Infante-Lopez 1
* Auteur correspondant
2 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : The smallest grammar problem - namely, finding a smallest context-free grammar that generates exactly one sequence - is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose to focus on the choice of the occurrences to be rewritten by non-terminals. We extend classical offline algorithms by introducing a global optimization of this choice at each step of the algorithm. This approach allows us to define the search space of a smallest grammar by separating the choice of the non-terminals and the choice of their occurrences. We propose a second algorithm that performs a broader exploration by allowing the removal of useless words that were chosen previously. Experiments on a classical benchmark show that our algorithms consistently find smaller grammars then state-of-the-art algorithms.
Liste complète des métadonnées

https://hal.inria.fr/inria-00476840
Contributeur : <>
Soumis le : mardi 27 avril 2010 - 13:52:25
Dernière modification le : mercredi 4 octobre 2017 - 16:08:21

Identifiants

  • HAL Id : inria-00476840, version 1
  • Mot de passe :

Collections

Citation

Rafael Carrascosa, François Coste, Matthias Gallé, G. Infante-Lopez. Choosing Word Occurrences for the Smallest Grammar Problem. LATA, May 2010, Trier, Germany. 2010. 〈inria-00476840〉

Partager

Métriques

Consultations de la notice

154