String Sanitization Under Edit Distance: Improved and Generalized - Archive ouverte HAL Access content directly
Conference Papers Year :

String Sanitization Under Edit Distance: Improved and Generalized

(1) , (2, 3, 4) , (2, 3, 4) , (2)


Let W be a string of length n over an alphabet Σ, k be a positive integer, and S be a set of length-k substrings of W. The ETFS problem asks us to construct a string X ED such that: (i) no string of S occurs in X ED ; (ii) the order of all other length-k substrings over Σ is the same in W and in X ED ; and (iii) X ED has minimal edit distance to W. When W represents an individual's data and S represents a set of confidential patterns, the ETFS problem asks for transforming W to preserve its privacy and its utility [Bernardini et al., ECML PKDD 2019]. ETFS can be solved in O(n 2 k) time [Bernardini et al., CPM 2020]. The same paper shows that ETFS cannot be solved in O(n 2−δ) time, for any δ > 0, unless the Strong Exponential Time Hypothesis (SETH) is false. Our main results can be summarized as follows: • An O(n 2 log 2 k)-time algorithm to solve ETFS. • An O(n 2 log 2 n)-time algorithm to solve AETFS, a generalization of ETFS in which the elements of S can have arbitrary lengths.
Fichier principal
Vignette du fichier
2007.08179.pdf (631.83 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03474030 , version 1 (10-12-2021)



Takuya Mieno, Solon P Pissis, Leen Stougie, Michelle Sweering. String Sanitization Under Edit Distance: Improved and Generalized. CPM 2021 - 32nd Annual Symposium on Combinatorial Pattern Matching, Jul 2021, Wrocław, Poland. pp.1-21. ⟨hal-03474030⟩


12 View
14 Download



Gmail Facebook Twitter LinkedIn More