Skip to Main content Skip to Navigation
Conference papers

YASS : similarity search in DNA sequences

Gregory Kucherov 1 Laurent Noé 1
1 ADAGE - Applying discrete algorithms to genomics
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Identifying similarity regions inside a DNA sequence (repeats), or between two sequences (local alignment), is a fundamental problem in bioinformatics. For this task, many algorithms use a technique based on searching for small exact repetitions of fixed size (seeds) and trying to extend those into larger approximate repeats. BLAST family [1] is the most prominent representative of this approach. ASSIRC [7] is another example. A slightly different but related method is implemented in FASTA [6]. REPuter [5] and MUMmer [4] use a different approach, based on suffix trees. We propose a new method which tries to group together multiple seeds, in order to form rapidely large similarity regions instead of extending individual seeds. In a very restricted form, this idea has been used in late versions of Blast [2]. Here we push it much further, and come up with a more sensitive approach, allowing for smaller seed sizes without considerable drop in time efficiency. For example, if we consider approximate repeats of size at least 100 with 75% of similarity between copies, one finds more frequently 3 (or more) distinct seeds of size at least 7 than one (or more) seed of size at least 11 (which is the default parameter of BLASTN). Grouping multiple seeds also reduces the number of infertile extensions, thus saving time for computing unnecessary alignment scores.
Document type :
Conference papers
Complete list of metadatas

https://hal.inria.fr/inria-00099599
Contributor : Publications Loria <>
Submitted on : Tuesday, September 26, 2006 - 9:39:08 AM
Last modification on : Wednesday, December 9, 2020 - 6:02:06 PM

Identifiers

  • HAL Id : inria-00099599, version 1

Collections

Citation

Gregory Kucherov, Laurent Noé. YASS : similarity search in DNA sequences. The Seventh Annual International Conference on Research in Computational Molecular Biology - RECOMB'03, Apr 2003, Berlin, Germany, 1 p. ⟨inria-00099599⟩

Share

Metrics

Record views

171