Skip to Main content Skip to Navigation

Towards a global methodology for mining cohorts with biological and genetic data

Abstract : Objective: Biomedical data collected in cohorts are rather complex to analyse because of their diversity and their size. This paper introduces an expert-guided methodology for extensively mining such data. For this specific study, the expert is interested in extracting profiles showing the co-occurrence of biological parameters with specific genetic polymorphisms. Methods and materials: Experiments have been performed with CORON, a data mining platform which has been design to fit the methodology. CORON proposes symbolic data mining methods, i.e. frequent pattern search and association rule extraction. The results found with the methodology have been validated by statistical tests. The used real data stem from the STANISLAS cohort, a population study composed of information that has been collected in healthy French families for ten years. As the STANISLAS cohort is composed of healthy individuals, the methodology described here is oriented for extracting less frequent patterns. Results: In this paper, two study frameworks illustrate the methodology. The first one (framework #1) deals with interactions involving lipids. The second one (framework #2) concerns the metabolic syndrome. Some results are reported thereafter. In framework #1, in men, a significant interaction was found between BMI>25 and APOB Thr71Ile polymorphism on LDL-cholesterol concentration (p=0.009). In women, we detected a potential protective genetic profile against cardiovascular diseases involving an interaction between APOE and APOB genes (p=0.016). In framework #2, the repartition of genotypes of the APOB71 polymorphism is significantly different whether an individual presents metabolic syndrome or not (p=0.03). Conclusion: The obtained results show the capabilities of the data mining methodology, performed with CORON, for suggesting new biological hypotheses to the expert which deserve to be further analysed by larger genetic epidemiology studies or wet laboratory experiments.
Complete list of metadata
Contributor : Sandy Maumus <>
Submitted on : Monday, July 3, 2006 - 4:21:35 PM
Last modification on : Friday, February 26, 2021 - 3:28:05 PM


  • HAL Id : inria-00000640, version 1


Sandy Maumus, Laszlo Szathmary, Amedeo Napoli, Eliane Albuisson, Sophie Visvikis-Siest. Towards a global methodology for mining cohorts with biological and genetic data. [Research Report] 2005, pp.28. ⟨inria-00000640⟩



Record views