mreps: a program for exhaustive search for tandem repeats in DNA sequences

Ghizlane Bana 1 Roman Kolpakov Mathieu Giraud 2 Ralph Rabbat Gregory Kucherov 1
1 ADAGE - Applying discrete algorithms to genomics
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Poster abstract -- Introduction ==The availability of complete genome sequences from many species allows to apply new approaches to characterizing particular genetic patterns. Thanks to bioinformatic tools, the whole genome sequences can be used to study the biology of pathogenic bacteria or genetic human diseases. Tandem repeats are contiguously repeated copies of the same nucleotide pattern. The increasing variety of such repeats that have been found through genome projects reveals that repeated DNA is by no means "junk DNA" but clearly involved in regulating gene expression. Minisatellites, a class of tandem repeats, have been shown to cause diseases by influencing gene expression, modifying coding sequences within genes or generating fragile sites. This is thought to happen in the case of triplet repeat diseases, such as Huntington's disease, which are caused by a bigger number of pattern copies in repeats located within specific genes. Many prokaryotic genomes contain various types of repeated DNA occurring in genes, in intergenic sequences, or in transposable elements. Repeats, such as long repeats, are representative of important evolutionary mechanisms that allow bacteria to quickly adapt to environmental changes. Indeed, many pathogens have developed the ability to alter surface-exposed molecules, most often in response to selective pressure associated with the host immune system. Thus, bacteria present very diverse evolutionary strategies linked to pathogenesis. One of these genetic mechanisms involves a change in the number of repeated patterns, which causes phenotype variation. The polymorphic character of repeats make them useful as DNA markers for mapping genomes that is an important approach for linkage analysis, which can be used to search for genes involved in a wide range of disorders. This shows the relative precision with which genome sequences can be used to investigate genetic diseases. Results ======= mreps is a computer program for exhaustively characterizing all tandem repeats in a given DNA sequence. The distribution, some documentation, references, and other related information are available at mreps implements combinatorial algorithms: it finds all repetitions which verify the user specification and does not refer to any statistical model or an heuristics. As a consequence, mreps does not associate any score to a repetition found. All repetitions which verify the user specification are considered a priori "equally good". mreps lets judge which repetitions are interesting :it finds them all and than the user cann sort them to any appropriate criterium. To simplify this job, mreps can output the result in xml format sa that they can be easily processed by other tools. mreps finds exact and approximate repetitions depending on the error parameter specified by the user. The current version of mreps treats only substitution errors and does not treat indels. Example ======= Applying mreps to whole genomes can readily reveal interesting repeats. As an example, we applied mreps to Neisseria meningitidis MC58 organism, which is a causative agent of meningitidis and septicaemia. The size of its genome is 2.24 Mband mreps revealed an exact tandem repeat of 32 Kb pattern, located in a coding region. An exact biological meaning of this repeat is still inknown. Here is the corresponding mreps output. Processing sequence 'gb|AE002098|AE002098 Neisseria meningitidis serogroup B strain MC58 complete gen' * Processing window [1 : 2272351] * from -> to: size "per." [exp.] repetition --- 1135353 -> 1199546 : 64194 "32036" [2.00] --- In practice, mreps is able to process genome sequences ofsize up to 30 Mb and to detect repetitions of unbounded pattern size. On a regular 400 MHz PC run under Linux, the program replies instantaneously on sequences of up to 1 Mb. A typicalrun takes about 1 second on a 2 Mb sequence, 30 seconds on a 8 Mb sequence, and 1.5 minutes on genomes reaching 30 Mb.
Type de document :
Communication dans un congrès
Sixth Annual International Conference on Research in Computational Biology - RECOMB 2002, Apr 2002, Washington, DC, US, 2002
Liste complète des métadonnées
Contributeur : Publications Loria <>
Soumis le : mardi 26 septembre 2006 - 14:52:33
Dernière modification le : jeudi 11 janvier 2018 - 06:19:48


  • HAL Id : inria-00100869, version 1



Ghizlane Bana, Roman Kolpakov, Mathieu Giraud, Ralph Rabbat, Gregory Kucherov. mreps: a program for exhaustive search for tandem repeats in DNA sequences. Sixth Annual International Conference on Research in Computational Biology - RECOMB 2002, Apr 2002, Washington, DC, US, 2002. 〈inria-00100869〉



Consultations de la notice