A Word Counting Graph - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Chapitre D'ouvrage Année : 2009

A Word Counting Graph

Résumé

We study methods for counting occurrences of words from a given set H over an alphabet V in a given text. All words have the same length m. Our goal is the computation of the probability to find p occurrences of words from a set H in a random text of size n, assuming that the text is generated by a Bernoulli or Markov model. We have designed an algorithm solving the problem; the algorithm relies on traversals of a graph, whose set of vertices is associated with the overlaps of words from H. Edges define two oriented subgraphs that can be interpreted as equivalence relations on words of H. Let P (H) be the set of equivalence classes and S be the set of other vertices. The run time for the Bernoulli model is O(np(|P (H)| +|S|)) time and the space complexity is O(pm|S| +|P (H)|). In a Markov model of order K, additional space complexity is O(pm|V | K ) and additional time complexity is O(npm|V | K). Our preprocessing uses a variant of Aho-Corasick automaton and achieves O(m|H|) time complexity. Our algorithm is implemented and provides a significant space improvement in practice. We compare its complexity to the additional improvement due to AhoCorasick minimization.
Fichier principal
Vignette du fichier
Final_refukiro.pdf (575.34 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00437147 , version 1 (29-11-2009)

Identifiants

  • HAL Id : inria-00437147 , version 1

Citer

Mireille Regnier, Zara Kirakossian, Eugenia Furletova, Mikhail A. Roytberg. A Word Counting Graph. Joseph Chan, Jacqueline W. Daykin and M. Sohel Rahman. London Algorithmics 2008: Theory and Practice (Texts in Algorithmics), London College Publications, 31 p., 2009, 978-1904987970. ⟨inria-00437147⟩
254 Consultations
187 Téléchargements

Partager

Gmail Facebook X LinkedIn More