Deterministic Regular Expressions in Linear Time

Abstract : Deterministic regular expressions are widely used in XML processing. For instance, all regular expressions in DTDs and XML Schemas are required to be deterministic. In this paper we show that determinism of a regular expression e can be tested in linear time. The best known algorithms, based on the Glushkov automaton, require O(σ|e|) time, where σ is the number of distinct symbols in e. We fur- ther show that matching a word w against an expression e can be achieved in combined linear time O(|e| + |w|), for a wide range of deterministic regular expressions: (i) star-free (for multiple input words), (ii) bounded-occurrence, i.e., ex- pressions in which each symbol appears a bounded number of times, and (iii) bounded plus-depth, i.e., expressions in which the nesting depth of alternating plus (union) and con- catenation symbols is bounded. Our algorithms use a new structural decomposition of the parse tree of e. For match- ing arbitrary deterministic regular expressions we present an O(|e| + |w| log log |e|) time algorithm.
Type de document :
Communication dans un congrès
PODS-31th ACM Symposium on Principles of Database Systems, 2012, Scottsdale, United States. pp.12, 2012
Liste complète des métadonnées

Littérature citée [27 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00618451
Contributeur : Inria Mostrare <>
Soumis le : vendredi 16 mars 2012 - 14:43:37
Dernière modification le : jeudi 11 janvier 2018 - 06:22:13
Document(s) archivé(s) le : mardi 13 décembre 2016 - 18:33:38

Fichier

1.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00618451, version 1

Collections

Citation

Benoit Groz, Sebastian Maneth, Slawomir Staworko. Deterministic Regular Expressions in Linear Time. PODS-31th ACM Symposium on Principles of Database Systems, 2012, Scottsdale, United States. pp.12, 2012. 〈inria-00618451〉

Partager

Métriques

Consultations de la notice

954

Téléchargements de fichiers

716