Skip to Main content Skip to Navigation
Conference papers

Online Dictionary Matching for Streams of XML Documents

Abstract : We consider the online multiple-pattern matching problem for streams of XML documents, when the patterns are expressed as linear XPath expressions containing child operators (/), descendant operators (//) and wildcards (*) but no predicates. For each document in the stream, the task is to determine all occurrences in the document of all the patterns. We present a general multiple-pattern-matching algorithm that is based on a backtracking deterministic finite automaton derived from the classic Aho-Corasick pattern-matching automaton. This automaton is of size linear in the sum of the sizes of the XPath patterns, and the worst-case time bound of the algorithm is better than the time bound of the simulation of linear-size nondeterministic automata. In addition to the worst-case-efficient general solution we present an algorithm with a simple backtracking mechanism that works extremely well for cases in which the backtracking stack remains low. Our experiments show that, when applied to filtering, this simple algorithm scales well as regards the number of patterns (or filters) and is competitive with YFilter, a widely accepted software for XML filtering.
Document type :
Conference papers
Complete list of metadata

Cited literature [13 references]  Display  Hide  Download

https://hal.inria.fr/hal-01054462
Contributor : Hal Ifip <>
Submitted on : Wednesday, August 6, 2014 - 4:26:07 PM
Last modification on : Thursday, November 26, 2020 - 2:56:03 PM
Long-term archiving on: : Wednesday, November 26, 2014 - 1:01:28 AM

File

03230155.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Panu Silvasti, Seppo Sippu, Eljas Soisalon-Soininen. Online Dictionary Matching for Streams of XML Documents. 6th IFIP TC 1/WG 2.2 International Conference on Theoretical Computer Science (TCS) / Held as Part of World Computer Congress (WCC), Sep 2010, Brisbane, Australia. pp.153-164, ⟨10.1007/978-3-642-15240-5_12⟩. ⟨hal-01054462⟩

Share

Metrics

Record views

249

Files downloads

290