Mapsembler, targeted assembly of larges genomes on a desktop computer

Pierre Peterlongo 1, * Rayan Chikhi 1
* Corresponding author
1 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Background: The analysis of next-generation sequencing data from large genomes is a timely research topic. Sequencers are producing billions of short sequence fragments from newly sequenced organisms. Computational methods for reconstructing sequences (whole-genome assemblers) are typically employed to process such data. However, one of the main drawback of these methods is the high memory requirement. Results: We present Mapsembler, an iterative targeted assembler which processes large datasets of reads on commodity hardware. Mapsembler checks for the presence of given regions of interest in the reads and reconstructs their neighborhood, either as a plain sequence (consensus) or as a graph (full sequence structure). We introduce new algorithms to retrieve homologues of a sequence from reads and construct an extension graph. Conclusions: Mapsembler is the rst software that enables de novo discovery around a region of interest of gene homologues, SNPs, exon skipping as well as other structural events, directly from raw sequencing reads. Compared to traditional assembly software, memory requirement and execution time of Mapsembler are considerably lower, as data indexing is localized. Mapsembler can be used at http://mapsembler.genouest.org
Complete list of metadatas

https://hal.inria.fr/inria-00577218
Contributor : Pierre Peterlongo <>
Submitted on : Wednesday, March 16, 2011 - 5:42:06 PM
Last modification on : Thursday, February 7, 2019 - 2:00:00 PM
Long-term archiving on : Thursday, November 8, 2012 - 12:00:17 PM

File

RR-7565.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00577218, version 1

Citation

Pierre Peterlongo, Rayan Chikhi. Mapsembler, targeted assembly of larges genomes on a desktop computer. [Research Report] RR-7565, INRIA. 2011, pp.17. ⟨inria-00577218⟩

Share

Metrics

Record views

594

Files downloads

176