Computational pan-genomics: status, promises and challenges

Tobias Marschall 1, 2 Manja Marz 3, 2 Thomas Abeel 4, 5 Louis Dijkstra 6 Bas E. Dutilh 7, 8 Ali Ghaffaari 1, 2 Paul Kersey 9 Wigard P. Kloosterman 10 Veli Makinen 11, 12 Adam M. Novak 13 Benedict Paten 13 David Porubsky 14 Eric Rivals 15, 16 Can Alkan 17 Jasmijn A. Baaijens 18 Paul I. W. De Bakker 18 Valentina Boeva 19, 20, 21, 22 Raoul J. P. Bonnal 23 Francesca Chiaromonte 24 Rayan Chikhi 25, 26 Francesca D. Ciccarelli 24 Robin Cijvat 27 Erwin Datema 28 Cornelia M. Van Duijn 29 Evan E. Eichler 30 Corinna Ernst 31 Eleazar Eskin 32 Erik Garrison 33 Mohammed El-Kebir 18 Gunnar W. Klau 34, 18 Jan O. Korbel 33 Eric-Wubbo Lameijer 35 Benjamin Langmead 36 Marcel Martin 37 Paul Medvedev 38 John C. Mu 39 Pieter Neerincx 40 Klaasjan Ouwens 41 Pierre Peterlongo 42 Nadia Pisanti 43, 34 Sven Rahmann 44 Ben Raphael 45 Knut Reinert 46 Dick De Ridder 47 Jeroen De Ridder 48 Matthias Schlesner 49 Ole Schulz-Trieglaff 50 Ashley D. Sanders 51 Siavash Sheikhizadeh 52 Carl Shneider 53 Sandra Smit 52 Daniel Valenzuela 54 Jiayin Wang 55 Lodewyk Wessels 56 Ying Zhang 18 Victor Guryev 14 Fabio Vandin 57 Kai Ye 55 Alexander Schönhuth 18
16 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
23 Integrative Biology Program [Milano]
INGM - Istituto Nazionale Genetica Molecolare [Milano]
25 BONSAI - Bioinformatics and Sequence Analysis
Université de Lille, Sciences et Technologies, Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189, CNRS - Centre National de la Recherche Scientifique
42 GenScale - Scalable, Optimized and Parallel Algorithms for Genomics
Inria Rennes – Bretagne Atlantique , IRISA_D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.
Type de document :
Article dans une revue
Briefings in Bioinformatics, Oxford University Press (OUP), 2018, 19 (1), pp.118-135. 〈10.1093/bib/bbw089〉
Liste complète des métadonnées

Littérature citée [152 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01390478
Contributeur : Pierre Peterlongo <>
Soumis le : mercredi 9 novembre 2016 - 11:32:05
Dernière modification le : mercredi 10 octobre 2018 - 14:28:13
Document(s) archivé(s) le : mardi 14 mars 2017 - 23:06:00

Fichier

Brief Bioinform-2016--bib-bbw0...
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

Citation

Tobias Marschall, Manja Marz, Thomas Abeel, Louis Dijkstra, Bas E. Dutilh, et al.. Computational pan-genomics: status, promises and challenges. Briefings in Bioinformatics, Oxford University Press (OUP), 2018, 19 (1), pp.118-135. 〈10.1093/bib/bbw089〉. 〈hal-01390478〉

Partager

Métriques

Consultations de la notice

1756

Téléchargements de fichiers

264