Finding Long and Multiple Repeats with Edit Distance

Maria Federico 1 Pierre Peterlongo 2 Nadia Pisanti 3 Marie-France Sagot 4
2 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
4 BAMBOO - An algorithmic view on genomes, cells, and environments
Inria Grenoble - Rhône-Alpes, LBBE - Laboratoire de Biométrie et Biologie Evolutive
Abstract : We present a tool for detecting long similar fragments that occur two or more times in a set of biological sequences. The problem has interesting applications in the analysis of biological sequences and their correlation, and becomes computationally challenging when a certain non negligible number of insertions, deletions and substitutions are allowed. For this reason exact exhaustive methods are hardly of practical use. In this paper we introduce a tool, FilmRed, that performs this task, and that manages instances whose size and parameters combination cannot be handled by any existing tool. This is achieved by using a filter as a preprocessing step, and by using the information that the filter has gathered also in the successive inference phase. To the best of our knowledge, FilmRed is the first ab initio tool that can deal with repeats occurring possibly several times, that have length of hundreds or thousands bases, and whose occurrences may differ in even more than 10% of their positions in terms of substitutions and indels.
Type de document :
Communication dans un congrès
The Prague Stringology Conference 2011, Aug 2011, Prague, Czech Republic. 2011
Liste complète des métadonnées
Contributeur : Pierre Peterlongo <>
Soumis le : mardi 12 juillet 2011 - 14:27:13
Dernière modification le : mercredi 21 février 2018 - 01:25:08


  • HAL Id : inria-00608208, version 1


Maria Federico, Pierre Peterlongo, Nadia Pisanti, Marie-France Sagot. Finding Long and Multiple Repeats with Edit Distance. The Prague Stringology Conference 2011, Aug 2011, Prague, Czech Republic. 2011. 〈inria-00608208〉



Consultations de la notice