The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing

Rafael Carrascosa 1 François Coste 2 Matthias Gallé 2 Gabriel Infante-Lopez 1
1 Equipe d'Informatique Linguistique - Grupo de Procesamiento de Lenguaje Natural (PLN)
LIGM - Laboratoire d'Informatique Gaspard-Monge, FaMAF - Facultad de Matemática, Astronomía y Física [Cordoba]
2 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : The smallest grammar problem--namely, finding a smallest context-free grammar that generates exactly one sequence--is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching for the smallest grammar given this set of constituents. We show how to solve the second task in polynomial time parsing longer constituent with smaller ones. We propose new algorithms based on classical practical algorithms that use this optimization to find small grammars. Our algorithms consistently find smaller grammars on a classical benchmark reducing the size in 10% in some cases. Moreover, our formulation allows us to define interesting bounds on the number of small grammars and to empirically compare different grammars of small size.
Liste complète des métadonnées

Littérature citée [25 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00638445
Contributeur : François Coste <>
Soumis le : vendredi 4 novembre 2011 - 17:50:26
Dernière modification le : mercredi 11 avril 2018 - 12:12:04
Document(s) archivé(s) le : dimanche 5 février 2012 - 02:29:25

Fichier

algorithms-04-00262-v2.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

Citation

Rafael Carrascosa, François Coste, Matthias Gallé, Gabriel Infante-Lopez. The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing. Algorithms, MDPI AG, 2011, pp.262-284. 〈http://www.mdpi.com//1999-4893/4/4/262/〉. 〈10.3390/a4040262〉. 〈inria-00638445〉

Partager

Métriques

Consultations de la notice

240

Téléchargements de fichiers

227