Lossless filter for multiple repetitions with Hamming distance

Pierre Peterlongo 1, * Nadia Pisanti 2 Frédéric Boyer 3 Alair Pereira Do Lago 4 Marie-France Sagot 3
* Auteur correspondant
1 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
3 HELIX - Computer science and genomics
Inria Grenoble - Rhône-Alpes, LBBE - Laboratoire de Biométrie et Biologie Evolutive
Abstract : Similarity search in texts, notably in biological sequences, has received substantial attention in the last few years. Numerous filtration and indexing techniques have been created in order to speed up the solution of the problem. However, previous filters were made for speeding up pattern matching, or for finding repetitions between two strings or o ccurring twice in the same string. In this paper, we present an algorithm called Nimbus for filtering strings prior to finding repetitions o ccurring twice or more in a string, or in two or more strings. Nimbus uses gapped seeds that are indexed with a new data structure, called a bi-factor array, that is also presented in this paper. Experimental results show that the filter can be very efficient: preprocessing with Nimbus a data set where one wants to find functional elements using a multiple lo cal alignment to ol such as Glam, the overall execution time can be reduced from 7.5 hours to 2 minutes.
Type de document :
Article dans une revue
Journal of Discrete Algorithms, Elsevier, 2008, 6 (3), pp.497-509. 〈10.1016/j.jda.2007.03.003〉
Liste complète des métadonnées

Littérature citée [23 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00179731
Contributeur : Pierre Peterlongo <>
Soumis le : mardi 23 octobre 2007 - 10:43:14
Dernière modification le : mercredi 11 avril 2018 - 01:56:49
Document(s) archivé(s) le : jeudi 27 septembre 2012 - 13:10:24

Fichier

jdaNimbus2.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Pierre Peterlongo, Nadia Pisanti, Frédéric Boyer, Alair Pereira Do Lago, Marie-France Sagot. Lossless filter for multiple repetitions with Hamming distance. Journal of Discrete Algorithms, Elsevier, 2008, 6 (3), pp.497-509. 〈10.1016/j.jda.2007.03.003〉. 〈inria-00179731〉

Partager

Métriques

Consultations de la notice

451

Téléchargements de fichiers

112