K. M. Tolle, The fourth paradigm: Data-intensive scientific discovery, Proceedings of the IEEE, pp.1334-1337, 2011.

J. Dean and S. Ghemawat, MapReduce, Proceedings of the 6th Conference on Symposium on OSDI. USENIX Association, 2004.
DOI : 10.1145/1327452.1327492

M. Zaharia, Resilient Distributed Datasets, Proceedings of the 9th USENIX Conference on NSDI. USENIX Association, 2012.
DOI : 10.1145/2886107.2886110

S. Ewen, Spinning fast iterative data flows, Proceedings of the VLDB Endowment, vol.5, issue.11, pp.1268-1279, 2012.
DOI : 10.14778/2350229.2350245

URL : http://arxiv.org/abs/1208.0088

J. Shi, Clash of the titans, Proc. VLDB Endow, pp.2110-2121, 2015.
DOI : 10.14778/2831360.2831365

D. Warneke and O. Kao, Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing in the Cloud, IEEE Transactions on Parallel and Distributed Systems, vol.22, issue.6, pp.985-997, 2011.
DOI : 10.1109/TPDS.2011.65

M. Armbrust, Scaling spark in the real world, Proc. VLDB Endow, pp.1840-1843, 2015.
DOI : 10.14778/2824032.2824080

E. Yildirim and T. Kosar, Network-aware end-to-end data throughput optimization, Proceedings of the first international workshop on Network-aware data management, NDM '11, pp.21-30, 2011.
DOI : 10.1145/2110217.2110221

T. J. Hacker, Adaptive data block scheduling for parallel TCP streams, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005., pp.265-275, 2005.
DOI : 10.1109/HPDC.2005.1520970

H. Li, Tachyon, Proceedings of the ACM Symposium on Cloud Computing, SOCC '14, pp.1-6, 2014.
DOI : 10.1145/2670979.2670985

D. Warneke, O. Kao, and ]. Heise, Nephele, Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, MTAGS '09, 2009.
DOI : 10.1145/1646468.1646476

M. Armbrust, Spark SQL, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pp.1383-1394, 2015.
DOI : 10.1145/2723372.2742797

B. Lohrmann, Nephele streaming: stream processing under QoS constraints at scale, Cluster Computing, vol.22, issue.6, pp.61-78, 2014.
DOI : 10.1007/s10586-013-0281-8

M. Zaharia, Discretized streams, Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pp.423-438, 2013.
DOI : 10.1145/2517349.2522737

J. E. Gonzalez, Graphx: Graph processing in a distributed dataflow framework, Proceedings of the 11th USENIX Conference OSDI. Berkeley: USENIX Association, pp.599-613, 2014.

R. Tudoran, TomusBlobs: Towards Communication-Efficient Storage for MapReduce Applications in Azure, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp.427-434, 2012.
DOI : 10.1109/CCGrid.2012.104

URL : https://hal.archives-ouvertes.fr/hal-00670725

F. Clemente, Enabling Big Data Analytics in the Hybrid Cloud using Iterative MapReduce, UCC'15: The 8th IEEE/ACM Intl. Conf. on Utility and Cloud Computing, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01207186

G. Ananthanarayanan, Grass: Trimming stragglers in approximation analytics, Proceedings of the 11th USENIX Conf. NSDI, pp.289-302, 2014.

Y. Kwon, SkewTune, Proceedings of the 2012 international conference on Management of Data, SIGMOD '12, pp.25-36, 2012.
DOI : 10.1145/2213836.2213840

K. Ousterhout, Making sense of performance in data analytics frameworks, Proceedings of the 12th USENIX Conf. NSDI, pp.293-307, 2015.

T. Akidau, The dataflow model, Proc. VLDB Endow, pp.1792-1803, 2015.
DOI : 10.14778/2824032.2824076

R. Bolze, Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed, International Journal of High Performance Computing Applications, vol.20, issue.4, pp.481-494, 2006.
DOI : 10.1177/1094342006070078

URL : https://hal.archives-ouvertes.fr/hal-00684943