Ensemble Learning for Large Scale Virtual Screening on Apache Spark

Abstract : Virtual screening (VS) is an in-silico tool for drug discovery that aims to identify the candidate drugs through computational techniques by screening large libraries of small molecules. Various ligand and structure-based virtual screening approaches have been proposed in the last decades. Machine learning (ML) techniques have been widely applied in drug discovery and development process, predominantly in ligand based virtual screening approaches. Ensemble learning is a very common paradigm in ML field, where many models are trained on the same problem’s data, to combine in the end the results in one improved prediction. Applying VS to massive molecular libraries (Big Data) is computationally intensive; so the split of these data to chunks to parallelize and distribute the task became necessary. For many years, MapReduce has been successfully applied on clusters to solve the problems with very large datasets, but with some limitations. Apache Spark is an open source framework for Big Data processing, which overcomes the shortcomings of MapReduce. In this paper, we propose a new approach based on ensemble learning paradigm in Apache Spark to improve in terms of execution time and precision the large-scale virtual screening. We generate a new training dataset to evaluate our approach. The experimental results show a good predictive performance up to 92% precision with an acceptable execution time.
Document type :
Conference papers
Abdelmalek Amine; Malek Mouhoub; Otmane Ait Mohamed; Bachir Djebbar. 6th IFIP International Conference on Computational Intelligence and Its Applications (CIIA), May 2018, Oran, Algeria. Springer International Publishing, IFIP Advances in Information and Communication Technology, AICT-522, pp.244-256, 2018, Computational Intelligence and Its Applications. 〈10.1007/978-3-319-89743-1_22〉
Liste complète des métadonnées

Cited literature [7 references]  Display  Hide  Download

https://hal.inria.fr/hal-01913905
Contributor : Hal Ifip <>
Submitted on : Wednesday, November 7, 2018 - 10:35:02 AM
Last modification on : Thursday, November 8, 2018 - 1:20:10 PM
Document(s) archivé(s) le : Friday, February 8, 2019 - 1:00:54 PM

File

 Restricted access
To satisfy the distribution rights of the publisher, the document is embargoed until : 2021-01-01

Please log in to resquest access to the document

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Karima Sid, Mohamed Batouche. Ensemble Learning for Large Scale Virtual Screening on Apache Spark. Abdelmalek Amine; Malek Mouhoub; Otmane Ait Mohamed; Bachir Djebbar. 6th IFIP International Conference on Computational Intelligence and Its Applications (CIIA), May 2018, Oran, Algeria. Springer International Publishing, IFIP Advances in Information and Communication Technology, AICT-522, pp.244-256, 2018, Computational Intelligence and Its Applications. 〈10.1007/978-3-319-89743-1_22〉. 〈hal-01913905〉

Share

Metrics

Record views

24