On Correlation Polynomials and Subword Complexity

Abstract : We consider words with letters from a $q-ary$ alphabet $\mathcal{A}$. The kth subword complexity of a word $w ∈\mathcal{A}^*$ is the number of distinct subwords of length $k$ that appear as contiguous subwords of $w$. We analyze subword complexity from both combinatorial and probabilistic viewpoints. Our first main result is a precise analysis of the expected $kth$ subword complexity of a randomly-chosen word $w ∈\mathcal{A}^n$. Our other main result describes, for $w ∈\mathcal{A}^*$, the degree to which one understands the set of all subwords of $w$, provided that one knows only the set of all subwords of some particular length $k$. Our methods rely upon a precise characterization of overlaps between words of length $k$. We use three kinds of correlation polynomials of words of length $k$: unweighted correlation polynomials; correlation polynomials associated to a Bernoulli source; and generalized multivariate correlation polynomials. We survey previously-known results about such polynomials, and we also present some new results concerning correlation polynomials.
Keywords :
Document type :
Conference papers
Domain :

Cited literature [28 references]

https://hal.inria.fr/hal-01184801
Contributor : Coordination Episciences Iam <>
Submitted on : Monday, August 17, 2015 - 5:00:30 PM
Last modification on : Friday, June 28, 2019 - 2:48:03 PM
Long-term archiving on: : Wednesday, November 18, 2015 - 12:18:52 PM

File

dmAH0101.pdf
Publisher files allowed on an open archive

Identifiers

• HAL Id : hal-01184801, version 1

Citation

Irina Gheorghiciuc, Mark Daniel Ward. On Correlation Polynomials and Subword Complexity. 2007 Conference on Analysis of Algorithms, AofA 07, 2007, Juan les Pins, France. pp.1-18. ⟨hal-01184801⟩

Record views