Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications

Résumé

An elastic-degenerate (ED) string T is a sequence of n sets T [1],. .. , T [n] containing m strings in total whose cumulative length is N. We call n, m, and N the length, the cardinality and the size of T , respectively. The language of T is defined as L(T) = {S1 • • • Sn : Si ∈ T [i] for all i ∈ [1, n]}. ED strings have been introduced to represent a set of closely-related DNA sequences, also known as a pangenome. The basic question we investigate here is: Given two ED strings, how fast can we check whether the two languages they represent have a nonempty intersection? We call the underlying problem the ED String Intersection (EDSI) problem. For two ED strings T1 and T2 of lengths n1 and n2, cardinalities m1 and m2, and sizes N1 and N2, respectively, we show the following: There is no O((N1N2) 1-ϵ)-time algorithm, thus no O (N1m2 + N2m1) 1-ϵ-time algorithm and no O (N1n2 + N2n1) 1-ϵ-time algorithm, for any constant ϵ > 0, for EDSI even when T1 and T2 are over a binary alphabet, unless the Strong Exponential-Time Hypothesis is false. There is no combinatorial O((N1 + N2) 1.2-ϵ f (n1, n2))-time algorithm, for any constant ϵ > 0 and any function f , for EDSI even when T1 and T2 are over a binary alphabet, unless the Boolean Matrix Multiplication conjecture is false. An O(N1 log N1 log n1 + N2 log N2 log n2)-time algorithm for outputting a compact (RLE) representation of the intersection language of two unary ED strings. In the case when T1 and T2 are given in a compact representation, we show that the problem is NP-complete. An O(N1m2 + N2m1)-time algorithm for EDSI. An Õ(N ω-1 1 n2 + N ω-1 2 n1)-time algorithm for EDSI, where ω is the exponent of matrix multiplication; the Õ notation suppresses factors that are polylogarithmic in the input size. We also show that the techniques we develop have applications outside of ED string comparison.
Fichier principal
Vignette du fichier
LIPIcs.CPM.2023.11.pdf (1.32 Mo) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-04365687 , version 1 (28-12-2023)

Licence

Paternité

Identifiants

Citer

Esteban Gabory, Moses Njagi Mwaniki, Nadia Pisanti, Solon P Pissis, Jakub Radoszewski, et al.. Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications. 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023), 2023, Marne-la-Vallée, France. ⟨10.4230/LIPIcs.CPM.2023.11⟩. ⟨hal-04365687⟩
38 Consultations
10 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More