Finding and Characterizing Repeats in Plant Genomes

Jacques Nicolas 1 Pierre Peterlongo 2 Sébastien Tempel 3
1 Dyliss - Dynamics, Logics and Inference for biological Systems and Sequences
IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE, Inria Rennes – Bretagne Atlantique
2 GenScale - Scalable, Optimized and Parallel Algorithms for Genomics
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of available software that can help biologists to look for these repeats and check some hypothetical models intended to characterize their structures. Since transposable elements are a major source of repeats in plants, many methods have been used or developed for this large class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided a whole section on this topic as well as a selection of the main existing software. In order to better understand how they work and how repeats may be efficiently found in genomes, it is necessary to look at the technical issues involved in the large--scale search of these structures. Indeed, it may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of the search for repeats and more complex patterns. The second section introduces the key concepts that are useful for understanding the current state of the art in playing with words, applied to genomic sequences. This can be seen as the first stage of a very general approach called linguistic analysis that is interested in the analysis of natural or artificial texts. Words, the lexical level, correspond to simple repeated entities in texts or strings. In fact, biologists need to represent more complex entities where a repeat family is built on more abstract structures, including direct or inverted small repeats, motifs, composition constraints as well as ordering and distance constraints between these elementary blocks. In terms of linguistics, this corresponds to the syntactic level of a language. The last section introduces concepts and practical tools that can be used to reach this syntactic level in biological sequence analysis.
Type de document :
Chapitre d'ouvrage
David Edwards. Plant Bioinformatics: Methods and Protocols, Humana Press - Springer Science+Business Media, pp.365, 2015, Methods in Molecular Biology, 978-1-4939-3166-8. 〈10.1007/978-1-4939-3167-5_17〉
Liste complète des métadonnées

Littérature citée [97 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01228488
Contributeur : Jacques Nicolas <>
Soumis le : jeudi 19 novembre 2015 - 17:15:44
Dernière modification le : jeudi 15 novembre 2018 - 11:57:53
Document(s) archivé(s) le : vendredi 28 avril 2017 - 20:03:22

Fichier

plant_repeats_nicolas_finHAL.p...
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Jacques Nicolas, Pierre Peterlongo, Sébastien Tempel. Finding and Characterizing Repeats in Plant Genomes. David Edwards. Plant Bioinformatics: Methods and Protocols, Humana Press - Springer Science+Business Media, pp.365, 2015, Methods in Molecular Biology, 978-1-4939-3166-8. 〈10.1007/978-1-4939-3167-5_17〉. 〈hal-01228488〉

Partager

Métriques

Consultations de la notice

350

Téléchargements de fichiers

503