Skip to Main content Skip to Navigation
Journal articles

Determinization and Minimization of Automata for Nested Words Revisited

Joachim Niehren 1 Momar Sakho 1
1 LINKS - Linking Dynamic Data
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : We consider the problem of determinizing and minimizing automata for nested words in practice. For this we compile the nested regular expressions (NREs) from the usual XPath benchmark to nested word automata (NWAs). The determinization of these NWAs, however, fails to produce reasonably small automata. In the best case, huge deterministic NWAs are produced after few hours, even for relatively small NREs of the benchmark. We propose a different approach to the determinization of automata for nested words. For this, we introduce stepwise hedge automata SHAs) that generalize naturally on both (stepwise) tree automata and on finite word automata. We then show how to determinize SHAs, yielding reasonably small deterministic automata for the NREs from the XPath benchmark. The size of deterministic SHAs automata can be reduced further by a novel minimization algorithm for a subclass of SHAs. In order to understand why the new approach to determinization and minimization works so nicely, we investigate the relationship between NWAs and SHAs further. Clearly, deterministic SHAs can be compiled to deterministic NWAs in linear time, and conversely, NWA_s can be compiled to nondeterministic SHAs in polynomial time. Therefore, we can use SHAs as intermediates for determinizing NWAs, while avoiding the huge size increase with the usual determinization algorithm for NWAs. Notably, the NWAs obtained from the SHAs perform bottom-up and left-to-right computations only, but no top-down computations. This NWA-behavior can be distinguished syntactically by the (weak) single-entry property, suggesting a close relationship between SHAs and single-entry NWAs. In particular, it turns out that the usual determinization algorithm for NWAs behaves well for single-entry NWAs, while it quickly explodes without the single-entry property. Furthermore, it is known that the class of deterministic multi-module single-entry NWAs enjoys unique minimization. The subclass of deterministic SHAs to which our novel minimization algorithm applies is different though, in that we do not impose multiple modules. As further optimizations for reducing the sizes of the constructed SHAs we propose schema-based cleaning and symbolic representations based on apply-else rules, that can be maintained by determinization. We implemented the optimizations and report the experimental results for the automata constructed for the XPathMark benchmark.
Document type :
Journal articles
Complete list of metadata

https://hal.inria.fr/hal-03134596
Contributor : Inria Links Connect in order to contact the contributor
Submitted on : Monday, February 8, 2021 - 2:18:06 PM
Last modification on : Friday, January 21, 2022 - 3:12:34 AM
Long-term archiving on: : Sunday, May 9, 2021 - 7:25:15 PM

File

0.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Joachim Niehren, Momar Sakho. Determinization and Minimization of Automata for Nested Words Revisited. Algorithms, MDPI, 2021, ⟨10.3390/a14030068⟩. ⟨hal-03134596⟩

Share

Metrics

Les métriques sont temporairement indisponibles