Skip to Main content Skip to Navigation
Journal articles

Computational pan-genomics: status, promises and challenges

Tobias Marschall 1, 2, * Manja Marz 3, 2 Thomas Abeel 4, 5 Louis Dijkstra 6 Bas E. Dutilh 7, 8 Ali Ghaffaari 1, 2 Paul Kersey 9 Wigard P. Kloosterman 10 Veli Makinen 11, 12 Adam M. Novak 13 Benedict Paten 13 David Porubsky 14 Eric Rivals 15, 16 Can Alkan 17 Jasmijn A. Baaijens 18 Paul I. W. de Bakker 18 Valentina Boeva 19, 20, 21, 22 Raoul J. P. Bonnal 23 Francesca Chiaromonte 24 Rayan Chikhi 25, 26 Francesca D. Ciccarelli 24 Robin Cijvat 27 Erwin Datema 28 Cornelia M. Van Duijn 29 Evan E. Eichler 30, 31 Corinna Ernst 32 Eleazar Eskin 33 Erik Garrison 34 Mohammed El-Kebir 18 Gunnar Klau 35, 18 Jan O. Korbel 34 Eric-Wubbo Lameijer 36 Benjamin Langmead 37 Marcel Martin 38 Paul Medvedev 39 John C. Mu 40 Pieter Neerincx 41 Klaasjan Ouwens 42 Pierre Peterlongo 43 Nadia Pisanti 44, 35 Sven Rahmann 45 Ben Raphael 46 Knut Reinert 47 Dick de Ridder 48 Jeroen de Ridder 49 Matthias Schlesner 50 Ole Schulz-Trieglaff 51 Ashley D. Sanders 52 Siavash Sheikhizadeh 53 Carl Shneider 54 Sandra Smit 53 Daniel Valenzuela 55 Jiayin Wang 56 Lodewyk Wessels 57 Ying Zhang 18 Victor Guryev 14 Fabio Vandin 58 Kai Ye 56 Alexander Schönhuth 18
* Corresponding author
16 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
23 Integrative Biology Program [Milano]
INGM - Istituto Nazionale Genetica Molecolare [Milano]
25 BONSAI - Bioinformatics and Sequence Analysis
CNRS - Centre National de la Recherche Scientifique, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189, Université de Lille, Sciences et Technologies, Inria Lille - Nord Europe
43 GenScale - Scalable, Optimized and Parallel Algorithms for Genomics
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.
Document type :
Journal articles
Complete list of metadata

Cited literature [152 references]  Display  Hide  Download

https://hal.inria.fr/hal-01390478
Contributor : Pierre Peterlongo Connect in order to contact the contributor
Submitted on : Wednesday, November 9, 2016 - 11:32:05 AM
Last modification on : Monday, November 22, 2021 - 1:52:23 PM
Long-term archiving on: : Tuesday, March 14, 2017 - 11:06:00 PM

File

Brief Bioinform-2016--bib-bbw0...
Publisher files allowed on an open archive

Identifiers

Citation

Tobias Marschall, Manja Marz, Thomas Abeel, Louis Dijkstra, Bas E. Dutilh, et al.. Computational pan-genomics: status, promises and challenges. Briefings in Bioinformatics, Oxford University Press (OUP), 2018, 19 (1), pp.118-135. ⟨10.1093/bib/bbw089⟩. ⟨hal-01390478⟩

Share

Metrics

Record views

4040

Files downloads

1888