Integrating and Warehousing Liver Gene Expression Data and Related Biomedical Resources in GEDAW

Abstract : Researchers at the medical research institute Inserm U522 1 , specialized in the liver, use high throughput technologies to diagnose liver disease states. They seek to identify the set of dysregulated genes in different physiopa-thological situations, along with the molecular regulation mechanisms involved in the occurrence of these diseases, leading at mid-term to new diagnostic and therapeutic tools. To be able to resolve such a complex question, one has to consider both data generated on the genes by in-house transcriptome experiments and annotations extracted from the many publicly available heterogeneous resources in Biomedicine. This paper presents GEDAW, a gene expression data warehouse that has been developed to assist such discovery processes. The distinctive feature of GEDAW is that it systematically integrates gene information from a multitude of structured data sources. Data sources include: i) XML records of GENBANK to annotate gene sequence features, integrated using a schema mapping approach, ii) an inhouse relational database that stores detailed experimental data on the liver genes and is a permanent source for providing expression levels to the warehouse without unnecessary details on the experiments , and iii) a semi-structured data source called BioMeKE-XML that provides for each gene its nomenclature, its functional annotation according to Gene Ontology, and its medical annotation according to the UMLS. Because GEDAW is a liver gene expression data warehouse, we have paid more attention to the medical knowledge to be able to correlate biology mechanisms and medical knowledge with experimental data. The paper discusses the data sources and the transformation process that is applied to resolve syntactic and semantic conflicts between the source format and the GEDAW schema.
Type de document :
Communication dans un congrès
Proceedings of the 2nd Intl. Workshop on Data Integration in the Life Sciences (DILS 2005), Jul 2005, San Diego, United States. 3615, Lecture Notes in Computer Science
Liste complète des métadonnées

Littérature citée [5 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01856023
Contributeur : Laure Berti-Equille <>
Soumis le : jeudi 9 août 2018 - 12:44:11
Dernière modification le : samedi 11 août 2018 - 01:10:33

Fichier

DILS2005.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01856023, version 1

Citation

Émilie Guerin, G. Marquet, Anita Burgun, Olivier Loréal, Laure Berti-Équille, et al.. Integrating and Warehousing Liver Gene Expression Data and Related Biomedical Resources in GEDAW. Proceedings of the 2nd Intl. Workshop on Data Integration in the Life Sciences (DILS 2005), Jul 2005, San Diego, United States. 3615, Lecture Notes in Computer Science. 〈hal-01856023〉

Partager

Métriques

Consultations de la notice

309

Téléchargements de fichiers

4