Skip to Main content Skip to Navigation
Book sections

A Word Counting Graph

Mireille Regnier 1 Zara Kirakossian 2 Eugenia Furletova 3 Mikhail Roytberg 3
1 AMIB - Algorithms and Models for Integrative Biology
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France
Abstract : We study methods for counting occurrences of words from a given set H over an alphabet V in a given text. All words have the same length m. Our goal is the computation of the probability to find p occurrences of words from a set H in a random text of size n, assuming that the text is generated by a Bernoulli or Markov model. We have designed an algorithm solving the problem; the algorithm relies on traversals of a graph, whose set of vertices is associated with the overlaps of words from H. Edges define two oriented subgraphs that can be interpreted as equivalence relations on words of H. Let P (H) be the set of equivalence classes and S be the set of other vertices. The run time for the Bernoulli model is O(np(|P (H)| +|S|)) time and the space complexity is O(pm|S| +|P (H)|). In a Markov model of order K, additional space complexity is O(pm|V | K ) and additional time complexity is O(npm|V | K). Our preprocessing uses a variant of Aho-Corasick automaton and achieves O(m|H|) time complexity. Our algorithm is implemented and provides a significant space improvement in practice. We compare its complexity to the additional improvement due to AhoCorasick minimization.
Document type :
Book sections
Complete list of metadatas

Cited literature [11 references]  Display  Hide  Download
Contributor : Mireille Regnier <>
Submitted on : Sunday, November 29, 2009 - 4:17:13 PM
Last modification on : Wednesday, September 16, 2020 - 5:04:36 PM
Long-term archiving on: : Thursday, June 17, 2010 - 10:42:21 PM


Files produced by the author(s)


  • HAL Id : inria-00437147, version 1



Mireille Regnier, Zara Kirakossian, Eugenia Furletova, Mikhail Roytberg. A Word Counting Graph. Joseph Chan, Jacqueline W. Daykin and M. Sohel Rahman. London Algorithmics 2008: Theory and Practice (Texts in Algorithmics), London College Publications, 31 p., 2009, 978-1904987970. ⟨inria-00437147⟩



Record views


Files downloads