Online Dictionary Matching for Streams of XML Documents - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

Online Dictionary Matching for Streams of XML Documents

Résumé

We consider the online multiple-pattern matching problem for streams of XML documents, when the patterns are expressed as linear XPath expressions containing child operators (/), descendant operators (//) and wildcards (*) but no predicates. For each document in the stream, the task is to determine all occurrences in the document of all the patterns. We present a general multiple-pattern-matching algorithm that is based on a backtracking deterministic finite automaton derived from the classic Aho-Corasick pattern-matching automaton. This automaton is of size linear in the sum of the sizes of the XPath patterns, and the worst-case time bound of the algorithm is better than the time bound of the simulation of linear-size nondeterministic automata. In addition to the worst-case-efficient general solution we present an algorithm with a simple backtracking mechanism that works extremely well for cases in which the backtracking stack remains low. Our experiments show that, when applied to filtering, this simple algorithm scales well as regards the number of patterns (or filters) and is competitive with YFilter, a widely accepted software for XML filtering.
Fichier principal
Vignette du fichier
03230155.pdf (135.89 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01054462 , version 1 (06-08-2014)

Licence

Paternité

Identifiants

Citer

Panu Silvasti, Seppo Sippu, Eljas Soisalon-Soininen. Online Dictionary Matching for Streams of XML Documents. 6th IFIP TC 1/WG 2.2 International Conference on Theoretical Computer Science (TCS) / Held as Part of World Computer Congress (WCC), Sep 2010, Brisbane, Australia. pp.153-164, ⟨10.1007/978-3-642-15240-5_12⟩. ⟨hal-01054462⟩
65 Consultations
104 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More