Seqcrawler: biological data indexing and browsing platform.

Olivier Sallou 1 Anthony Bretaudeau 2 Aurelien Roult 2
1 Plateforme bioinformatique GenOuest [Rennes]
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, UR1 - Université de Rennes 1, Plateforme Génomique Santé Biogenouest®, Inria Rennes – Bretagne Atlantique
2 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : ABSTRACT: BACKGROUND: Seqcrawler takes its roots in software like SRS or Lucegene. It provides an indexing platform to ease the search of data and meta-data in biological banks and it can scale to face the current flow of data. While many biological bank search tools are available on the Internet, mainly provided by large organizations to search in their data, there is a lack of free and open source solution to browse one own set of data with a flexible query system and able to scale from single computer to a cloud system. A personal index platform will help labs and bioinformaticians to search in their meta-data but also to build a larger information system with custom subsets of data. RESULTS: The software is scalable from a single computer to a cloud-based infrastructure. It has been successfully tested in a private cloud with 3 index shards (piece of index) hosting ~400 millions of sequence information (whole GenBank, UniProt, PDB and others) for a total size of 600 GB in a fault tolerant architecture (high-availability). It has also been successfully integrated with software to add extra meta-data from blast results to enhance user's result analysis. CONCLUSIONS: Seqcrawler provides a complete open source search and store solution for labs or platforms needing to manage large amount of data/meta-data with a flexible and customizable web interface. All components (search engine, visualization and data storage), though independent, share a common and coherent data system that can be queried with a simple HTTP interface. The solution scales easily and can also provide a high availability infrastructure.
Type de document :
Article dans une revue
BMC Bioinformatics, BioMed Central, 2012, 13 (1), pp.175. 〈10.1186/1471-2105-13-175〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00728279
Contributeur : Olivier Sallou <>
Soumis le : mercredi 5 septembre 2012 - 13:36:33
Dernière modification le : mercredi 11 avril 2018 - 01:51:16

Lien texte intégral

Identifiants

Citation

Olivier Sallou, Anthony Bretaudeau, Aurelien Roult. Seqcrawler: biological data indexing and browsing platform.. BMC Bioinformatics, BioMed Central, 2012, 13 (1), pp.175. 〈10.1186/1471-2105-13-175〉. 〈hal-00728279〉

Partager

Métriques

Consultations de la notice

232