J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008.

A. Hadoop,

M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust et al., Apache Spark: a unified engine for Big Data processing, Communications of the ACM, vol.59, issue.11, pp.56-65, 2016.

K. Siddique, Z. Akhtar, E. J. Yoon, Y. Jeong, D. Dasgupta et al., Apache Hama: an emerging bulk synchronous parallel computing framework for Big Data applications, IEEE Access, vol.4, pp.8879-8887, 2016.

J. Veiga, R. R. Expósito, G. L. Taboada, and J. Touriño, FlameMR: an event-driven architecture for MapReduce applications, Future Generation Computer Systems, vol.65, pp.46-56, 2016.

, Enhancing in-memory efficiency for MapReduce-based data processing, Journal of Parallel and Distributed Computing, vol.120, pp.323-338, 2018.

K. Shvachko, H. Kuang, S. Radia, and R. Chansler, The Hadoop Distributed File System, IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST'2010), pp.1-10, 2010.

D. Yang, X. Zhong, D. Yan, F. Dai, X. Yin et al., NativeTask: a Hadoop compatible framework for high performance, 2013 IEEE International Conference on Big Data, pp.94-101, 2013.

Z. Fadika, E. Dede, M. Govindaraju, and L. Ramakrishnan, MARIANE: using MapReduce in HPC environments, Future Generation Computer Systems, vol.36, pp.379-388, 2014.

M. Wasi-ur-rahman, N. S. Islam, X. Lu, J. Jose, H. Subramoni et al., High-performance RDMA-based design of Hadoop MapReduce over InfiniBand, 27th IEEE International Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW'13), pp.1908-1917, 2013.

Z. Zhang, K. Barbary, F. A. Nothaft, E. R. Sparks, O. Zahn et al., Kira: processing astronomy imagery using Big Data technology, IEEE Transactions on Big Data, 2016.
DOI : 10.1109/tbdata.2016.2599926

O. Gutsche, M. Cremonesi, P. Elmer, B. Jayatilaka, J. Kowalkowski et al., Big Data in HEP: a comprehensive use case study, Journal of Physics: Conference Series, vol.898, issue.7, p.72012, 2017.
DOI : 10.1088/1742-6596/898/7/072012
URL : http://iopscience.iop.org/article/10.1088/1742-6596/898/7/072012/pdf

R. Brun and F. Rademakers, ROOT-an object oriented data analysis framework, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol.389, issue.1-2, pp.81-86, 1997.
DOI : 10.1016/s0168-9002(97)00048-x

M. Tatineni, X. Lu, D. Choi, A. Majumdar, and D. K. Panda, Experiences and benefits of running RDMA-Hadoop and Spark on SDSC Comet, 5th Annual Conference on Diversity, Big Data, and Science at Scale (XSEDE'16), vol.23, p.5, 2016.

X. Lu, D. Shankar, S. Gugnani, and D. K. Panda, High-performance design of Apache Spark with RDMA and its benefits on various workloads, 2016 IEEE International Conference on Big Data, pp.253-262, 2016.

Y. Tang, H. Guo, T. Yuan, Q. Wu, X. Li et al., OEHadoop: accelerate Hadoop applications by co-designing Hadoop with Data Center Network, IEEE Access, vol.6, pp.25-849, 2018.
DOI : 10.1109/access.2018.2830799
URL : https://doi.org/10.1109/access.2018.2830799

Y. Chen, S. Alspaugh, and R. Katz, Interactive analytical processing in Big Data systems: a cross-industry study of MapReduce workloads, Proceedings of the VLDB Endowment, vol.5, pp.1802-1813, 2012.

A. Shinnar, D. Cunningham, V. Saraswat, and B. Herta, M3R: increased performance for in-memory Hadoop jobs, Proceedings of the VLDB Endowment, vol.5, pp.1736-1747, 2012.

D. Yan, X. Yin, C. Lian, X. Zhong, X. Zhou et al., Using memory in the right way to accelerate Big Data processing, Journal of Computer Science and Technology, vol.30, issue.1, pp.30-41, 2015.

P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra et al., X10: an object-oriented approach to non-uniform cluster computing, 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA'05), pp.519-538, 2005.

B. Lange and T. Nguyen, A Hadoop use case for engineering data, 12th International Conference on Cooperative Design, Visualization and Engineering (CDVE'15), pp.134-141, 2015.
DOI : 10.1007/978-3-319-24132-6_16
URL : https://hal.archives-ouvertes.fr/hal-01167510

A. Hbase, Hadoop distributed Big Data store, 2018.

, Grid'5000: large-scale resource provisioning network, 2018.

C. Chen, Y. Chang, W. Chung, D. Lee, and J. Ho, CloudRS: an error correction algorithm of high-throughput sequencing data based on scalable framework, 2013 IEEE International Conference on Big Data, pp.717-722, 2013.

S. Gnerre, I. Maccallum, D. Przybylski, F. J. Ribeiro, J. N. Burton et al., High-quality draft assemblies of mammalian genomes from massively parallel sequence data, Proceedings of the National Academy of Sciences, vol.108, issue.4, pp.1513-1518, 2011.

, DDBJ Sequence Read Archive (DRA), 2018.

J. Veiga, J. Enes, R. R. Expósito, and J. Touriño, BDEv 3.0: energy efficiency and microarchitectural characterization of Big Data processing frameworks, Future Generation Computer Systems, vol.86, pp.565-581, 2018.

R. R. Expósito, J. Veiga, J. González-domínguez, and J. Touriño, MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud, Bioinformatics, vol.33, issue.17, pp.2762-2764, 2017.

J. González-domínguez and B. Schmidt, ParDRe: faster parallel duplicated reads removal tool for sequencing studies, Bioinformatics, vol.32, issue.10, pp.1562-1564, 2016.