YASS : similarity search in DNA sequences

Gregory Kucherov 1 Laurent Noé 1
1 ADAGE - Applying discrete algorithms to genomics
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Identifying similarity regions inside a DNA sequence (repeats), or between two sequences (local alignment), is a fundamental problem in bioinformatics. For this task, many algorithms use a technique based on searching for small exact repetitions of fixed size (seeds) and trying to extend those into larger approximate repeats. BLAST family [1] is the most prominent representative of this approach. ASSIRC [7] is another example. A slightly different but related method is implemented in FASTA [6]. REPuter [5] and MUMmer [4] use a different approach, based on suffix trees. We propose a new method which tries to group together multiple seeds, in order to form rapidely large similarity regions instead of extending individual seeds. In a very restricted form, this idea has been used in late versions of Blast [2]. Here we push it much further, and come up with a more sensitive approach, allowing for smaller seed sizes without considerable drop in time efficiency. For example, if we consider approximate repeats of size at least 100 with 75% of similarity between copies, one finds more frequently 3 (or more) distinct seeds of size at least 7 than one (or more) seed of size at least 11 (which is the default parameter of BLASTN). Grouping multiple seeds also reduces the number of infertile extensions, thus saving time for computing unnecessary alignment scores.
Type de document :
Communication dans un congrès
The Seventh Annual International Conference on Research in Computational Molecular Biology - RECOMB'03, Apr 2003, Berlin, Germany, 1 p, 2003
Liste complète des métadonnées

https://hal.inria.fr/inria-00099599
Contributeur : Publications Loria <>
Soumis le : mardi 26 septembre 2006 - 09:39:08
Dernière modification le : jeudi 11 janvier 2018 - 06:19:48

Identifiants

  • HAL Id : inria-00099599, version 1

Collections

Citation

Gregory Kucherov, Laurent Noé. YASS : similarity search in DNA sequences. The Seventh Annual International Conference on Research in Computational Molecular Biology - RECOMB'03, Apr 2003, Berlin, Germany, 1 p, 2003. 〈inria-00099599〉

Partager

Métriques

Consultations de la notice

155