inria-00476840, version 1
Choosing Word Occurrences for the Smallest Grammar Problem
Rafael Carrascosa a, 1François Coste
b, 2Matthias Gallé
b, 2G. Infante-Lopez 1
LATA (2010)
Résumé : The smallest grammar problem - namely, finding a smallest context-free grammar that generates exactly one sequence - is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose to focus on the choice of the occurrences to be rewritten by non-terminals. We extend classical offline algorithms by introducing a global optimization of this choice at each step of the algorithm. This approach allows us to define the search space of a smallest grammar by separating the choice of the non-terminals and the choice of their occurrences. We propose a second algorithm that performs a broader exploration by allowing the removal of useless words that were chosen previously. Experiments on a classical benchmark show that our algorithms consistently find smaller grammars then state-of-the-art algorithms.
- a – Universidad Nacional de Córdoba
- b – INRIA
- 1 : Grupo de Procesamiento de Lenguaje Natural (PLN - FaMAF)
- Universidad Nacional de Córdoba
- 2 : SYMBIOSE (INRIA - IRISA)
- CNRS : UMR6074 – INRIA – INSA Rennes – Université de Rennes 1
- Domaine : Sciences du Vivant/Bio-Informatique, Biologie Systémique
Informatique/Algorithme et structure de données
Informatique/Bio-informatique
Informatique/Théorie et langage formel
Informatique/Intelligence artificielle
- inria-00476840, version 1
- http://hal.inria.fr/inria-00476840
- oai:hal.inria.fr:inria-00476840
- Contributeur : François Coste
- Soumis le : Mardi 27 Avril 2010, 13:52:25
- Dernière modification le : Jeudi 29 Avril 2010, 13:43:00






Exporter