Rime: Repeat identification

Maria Federico 1 Pierre Peterlongo 2 Nadia Pisanti 3, 4 Marie-France Sagot 5, 6
2 GenScale - Scalable, Optimized and Parallel Algorithms for Genomics
IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE, Inria Rennes – Bretagne Atlantique
5 Baobab
PEGASE - Département PEGASE [LBBE]
6 BAMBOO - An algorithmic view on genomes, cells, and environments
Inria Grenoble - Rhône-Alpes, LBBE - Laboratoire de Biométrie et Biologie Evolutive - UMR 5558
Abstract : We present an algorithm for detecting long similar fragments occurring at least twice in a set of biological sequences. The problem becomes computationally challenging when the frequency of a repeat is allowed to increase and when a non-negligible number of insertions, deletions and substitutions are allowed. We introduce in this paper an algorithm, Rime1 (for Repeat Identification: long, Multiple, and with Edits) that performs this task, and manages instances whose size and combination of parameters cannot be handled by other currently existing methods. This is achieved by using a filter as a preprocessing step, and by then exploiting the information gathered by the filter in the following actual repeat inference step. To the best of our knowledge, Rime is the first algorithm that can accurately deal with very long repeats (up to a few thousands), occurring possibly several times, and with a rate of differences (substitutions and indels) allowed among copies of a same repeat of 10-15% or even more.
Complete list of metadatas

Cited literature [32 references]  Display  Hide  Download

https://hal.inria.fr/hal-00802023
Contributor : Pierre Peterlongo <>
Submitted on : Thursday, June 29, 2017 - 9:20:09 AM
Last modification on : Friday, September 27, 2019 - 10:10:19 AM
Long-term archiving on : Thursday, January 18, 2018 - 1:54:20 AM

File

federico2014.pdf
Files produced by the author(s)

Identifiers

Citation

Maria Federico, Pierre Peterlongo, Nadia Pisanti, Marie-France Sagot. Rime: Repeat identification. Discrete Applied Mathematics, Elsevier, 2014, 163 (3), pp.275-286. ⟨10.1016/j.dam.2013.02.016⟩. ⟨hal-00802023⟩

Share

Metrics

Record views

1179

Files downloads

127