Online Dictionary Matching for Streams of XML Documents

Abstract : We consider the online multiple-pattern matching problem for streams of XML documents, when the patterns are expressed as linear XPath expressions containing child operators (/), descendant operators (//) and wildcards (*) but no predicates. For each document in the stream, the task is to determine all occurrences in the document of all the patterns. We present a general multiple-pattern-matching algorithm that is based on a backtracking deterministic finite automaton derived from the classic Aho-Corasick pattern-matching automaton. This automaton is of size linear in the sum of the sizes of the XPath patterns, and the worst-case time bound of the algorithm is better than the time bound of the simulation of linear-size nondeterministic automata. In addition to the worst-case-efficient general solution we present an algorithm with a simple backtracking mechanism that works extremely well for cases in which the backtracking stack remains low. Our experiments show that, when applied to filtering, this simple algorithm scales well as regards the number of patterns (or filters) and is competitive with YFilter, a widely accepted software for XML filtering.
Type de document :
Communication dans un congrès
Cristian S. Calude; Vladimiro Sassone. 6th IFIP TC 1/WG 2.2 International Conference on Theoretical Computer Science (TCS) / Held as Part of World Computer Congress (WCC), Sep 2010, Brisbane, Australia. Springer, IFIP Advances in Information and Communication Technology, AICT-323, pp.153-164, 2010, Theoretical Computer Science. 〈10.1007/978-3-642-15240-5_12〉
Liste complète des métadonnées

Littérature citée [13 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01054462
Contributeur : Hal Ifip <>
Soumis le : mercredi 6 août 2014 - 16:26:07
Dernière modification le : mercredi 9 août 2017 - 12:03:13
Document(s) archivé(s) le : mercredi 26 novembre 2014 - 01:01:28

Fichier

03230155.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Panu Silvasti, Seppo Sippu, Eljas Soisalon-Soininen. Online Dictionary Matching for Streams of XML Documents. Cristian S. Calude; Vladimiro Sassone. 6th IFIP TC 1/WG 2.2 International Conference on Theoretical Computer Science (TCS) / Held as Part of World Computer Congress (WCC), Sep 2010, Brisbane, Australia. Springer, IFIP Advances in Information and Communication Technology, AICT-323, pp.153-164, 2010, Theoretical Computer Science. 〈10.1007/978-3-642-15240-5_12〉. 〈hal-01054462〉

Partager

Métriques

Consultations de la notice

153

Téléchargements de fichiers

76