Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue BMC Bioinformatics Année : 2019

Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection

Résumé

Background: MicroRNAs (miRNAs) are noncoding RNA molecules heavily involved in human tumors, in which few of them circulating the human body. Finding a tumor-associated signature of miRNA, that is, the minimum miRNA entities to be measured for discriminating both different types of cancer and normal tissues, is of utmost importance. Feature selection techniques applied in machine learning can help however they often provide naive or biased results. Results: An ensemble feature selection strategy for miRNA signatures is proposed. miRNAs are chosen based on consensus on feature relevance from high-accuracy classifiers of different typologies. This methodology aims to identify signatures that are considerably more robust and reliable when used in clinically relevant prediction tasks. Using the proposed method, a 100-miRNA signature is identified in a dataset of 8023 samples, extracted from TCGA. When running eight-state-of-the-art classifiers along with the 100-miRNA signature against the original 1046 features, it could be detected that global accuracy differs only by 1.4%. Importantly, this 100-miRNA signature is sufficient to distinguish between tumor and normal tissues. The approach is then compared against other feature selection methods, such as UFS, RFE, EN, LASSO, Genetic Algorithms, and EFS-CLA. The proposed approach provides better accuracy when tested on a 10-fold cross-validation with different classifiers and it is applied to several GEO datasets across different platforms with some classifiers showing more than 90% classification accuracy, which proves its cross-platform applicability. Conclusions: The 100-miRNA signature is sufficiently stable to provide almost the same classification accuracy as the complete TCGA dataset, and it is further validated on several GEO datasets, across different types of cancer and platforms. Furthermore, a bibliographic analysis confirms that 77 out of the 100 miRNAs in the signature appear in lists of circulating miRNAs used in cancer studies, in stem-loop or mature-sequence form. The remaining 23 miRNAs offer potentially promising avenues for future research.
Fichier principal
Vignette du fichier
s12859-019-3050-8.pdf (3.59 Mo) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-02344257 , version 1 (04-11-2019)

Licence

Paternité

Identifiants

Citer

Alejandro Lopez-Rincon, Marlet Martinez-Archundia, Gustavo U Martinez-Ruiz, Alexander Schoenhuth, Alberto Tonda. Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection. BMC Bioinformatics, 2019, 20 (1), pp.1-17. ⟨10.1186/s12859-019-3050-8⟩. ⟨hal-02344257⟩
147 Consultations
185 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More