Skip to Main content Skip to Navigation
New interface
Conference papers

Choosing Word Occurrences for the Smallest Grammar Problem

Rafael Carrascosa 1 François Coste 2, * Matthias Gallé 2 G. Infante-Lopez 1 
* Corresponding author
1 Equipe d'Informatique Linguistique - Grupo de Procesamiento de Lenguaje Natural (PLN)
LIGM - Laboratoire d'Informatique Gaspard-Monge, FaMAF - Facultad de Matemática, Astronomía y Física [Cordoba]
2 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : The smallest grammar problem - namely, finding a smallest context-free grammar that generates exactly one sequence - is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose to focus on the choice of the occurrences to be rewritten by non-terminals. We extend classical offline algorithms by introducing a global optimization of this choice at each step of the algorithm. This approach allows us to define the search space of a smallest grammar by separating the choice of the non-terminals and the choice of their occurrences. We propose a second algorithm that performs a broader exploration by allowing the removal of useless words that were chosen previously. Experiments on a classical benchmark show that our algorithms consistently find smaller grammars then state-of-the-art algorithms.
Complete list of metadata
Contributor : François Coste Connect in order to contact the contributor
Submitted on : Tuesday, April 27, 2010 - 1:52:25 PM
Last modification on : Friday, February 4, 2022 - 3:12:28 AM


  • HAL Id : inria-00476840, version 1


Rafael Carrascosa, François Coste, Matthias Gallé, G. Infante-Lopez. Choosing Word Occurrences for the Smallest Grammar Problem. LATA, May 2010, Trier, Germany. ⟨inria-00476840⟩



Record views