Rime: Repeat identification

Maria Federico; Pierre Peterlongo; Nadia Pisanti; Marie-France Sagot

doi:10.1016/j.dam.2013.02.016

Article Dans Une Revue Discrete Applied Mathematics Année : 2014

Rime: Repeat identification

(1) , (2) , (3, 4) , (5, 6)

1
2
3
4
5
6

Maria Federico

Fonction : Auteur
PersonId : 943967

Università degli Studi di Modena e Reggio Emilia = University of Modena and Reggio Emilia

Pierre Peterlongo

Fonction : Auteur
PersonId : 171998
IdHAL : pierre-peterlongo
ORCID : 0000-0003-0776-6407
IdRef : 12482062X

Scalable, Optimized and Parallel Algorithms for Genomics

Nadia Pisanti

Fonction : Auteur
PersonId : 843474

Leiden Institute of Advanced Computer Science [Leiden]

Department of Computer Science [Pisa]

Marie-France Sagot

Fonction : Auteur
PersonId : 170068
IdHAL : marie-france-sagot
IdRef : 103537562

Baobab [LBBE]

An algorithmic view on genomes, cells, and environments

Résumé

We present an algorithm for detecting long similar fragments occurring at least twice in a set of biological sequences. The problem becomes computationally challenging when the frequency of a repeat is allowed to increase and when a non-negligible number of insertions, deletions and substitutions are allowed. We introduce in this paper an algorithm, Rime1 (for Repeat Identification: long, Multiple, and with Edits) that performs this task, and manages instances whose size and combination of parameters cannot be handled by other currently existing methods. This is achieved by using a filter as a preprocessing step, and by then exploiting the information gathered by the filter in the following actual repeat inference step. To the best of our knowledge, Rime is the first algorithm that can accurately deal with very long repeats (up to a few thousands), occurring possibly several times, and with a rate of differences (substitutions and indels) allowed among copies of a same repeat of 10-15% or even more.

Domaines

Algorithme et structure de données [cs.DS] Bio-informatique [q-bio.QM] Bio-Informatique, Biologie Systémique [q-bio.QM]

Fichier principal

federico2014.pdf (453.52 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Pierre Peterlongo : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00802023

Soumis le : jeudi 29 juin 2017-09:20:09

Dernière modification le : vendredi 17 mai 2024-17:12:03

Archivage à long terme le : jeudi 18 janvier 2018-01:54:20

Dates et versions

hal-00802023 , version 1 (29-06-2017)

Identifiants

HAL Id : hal-00802023 , version 1
DOI : 10.1016/j.dam.2013.02.016

Citer

Maria Federico, Pierre Peterlongo, Nadia Pisanti, Marie-France Sagot. Rime: Repeat identification. Discrete Applied Mathematics, 2014, 163 (3), pp.275-286. ⟨10.1016/j.dam.2013.02.016⟩. ⟨hal-00802023⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM EC-PARIS UNIV-RENNES1 CNRS INRIA UNIV-LYON1 INSA-RENNES IRISA BAMBOO IRISA-D7 BIOENVIS INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES LBBE UDL ANR UR1-MATH-NUM

579 Consultations

147 Téléchargements

Rime: Repeat identification

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager