Skip to Main content Skip to Navigation
Conference papers

String Sanitization Under Edit Distance

Abstract : Let W be a string of length n over an alphabet Σ, k be a positive integer, and S be a set of length-k substrings of W. The ETFS problem asks us to construct a string X ED such that: (i) no string of S occurs in X ED ; (ii) the order of all other length-k substrings over Σ is the same in W and in X ED ; and (iii) X ED has minimal edit distance to W. When W represents an individual's data and S represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in O(kn 2) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of |Σ|. Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in O(n 2−δ) time, for any δ > 0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS.
Document type :
Conference papers
Complete list of metadatas

Cited literature [25 references]  Display  Hide  Download


https://hal.inria.fr/hal-02957647
Contributor : Marie-France Sagot <>
Submitted on : Monday, October 5, 2020 - 12:26:11 PM
Last modification on : Tuesday, October 6, 2020 - 3:29:08 AM

Files

LIPIcs-CPM-2020-7.pdf
Publisher files allowed on an open archive

Identifiers

Collections

Citation

Giulia Bernardini, Huiping Chen, Grigorios Loukides, Nadia Pisanti, Solon Pissis, et al.. String Sanitization Under Edit Distance. 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020), 2020, Copenhagen, Denmark. ⟨10.4230/LIPIcs.CPM.2020.7⟩. ⟨hal-02957647⟩

Share

Metrics

Record views

10

Files downloads

51